“Data” — it’s the new buzzword in the open source world, according to Tuesday’s keynote speakers at the Open Source Business Conference, being held in San Francisco.
“The kind of data we’re collecting today is way harder to store and process than it used to be,” said Mike Olson, president and CEO of Cloudera. This is exacerbated by the explosion in the volume of data generated, he added.
New tools such as Cloudera’s Hadoop are required to deal with data now, Olson said.
The value of data also was the theme of the keynote by Stephen O’Grady, a cofounder of and analyst at Redmonk.
“The age of data is upon us,” O’Grady said, adding that open source companies should focus on leveraging data for growth instead of trying to develop and sell software.The Three Vs of Data
Corporations are now getting more data of different types faster than they used to, Olson said. “Volume, velocity and variety are the three big Vs of data,” he added.
For example, analyst firm Gartner says the amount of complex data, which doesn’t fit into enterprise databases, will grow by 1.86 percent a year up to 2016, Olson said.
“Data used to be generated by human beings doing human things at a human scale,” Olson remarked. “These days, machines are talking to machines, and the variety of data they’re exchanging is absolutely exploding.”
The enterprise data warehouses and business intelligence tools built up over the past 30 years cannot handle the types of data we have today, Olson suggested. “They were built up along business problems that were critical in the ’80s and the ’90s,” he explained.
One of the “immutable laws” of data warehousing is that if data doesn’t fit a data warehouse’s schema, it must be rejected, Olson said. While there are “lots of good reasons” for data warehouse administrators to adhere to and implement that rule, it could lead to some data loss.
“Sometimes data is dirty or the representation has changed. Maybe your business partner has changed his data schema and sends you a different field,” Olson pointed out. “So variety is a critical property of the data that we need to deal with today.”
The speed at which corporations get data is also important.
“If you want to watch the Arab Spring unfold or want to know what the Internet thinks, you need to deal with variety — the ebbs and flows of data,” Olson stated.
The Arab Spring is the term given to the popular uprisings that emerged in the Middle East earlier this year.
Gather Ye Data While Ye May
Enterprises can analyze data better when they can capture more data of different types coming in at any velocity.
“A Google person — I always forget his name when I get on stage — said that you can see patterns in vast amounts of data that were simply impossible to see in smaller amounts of data,” Olson stated.
For example, companies doing business on the Internet collect and log as much data as they can about customers visiting their websites. “They can see how quickly you scrolled through, what pages you ignored, what pages you clicked on,” Olson said.
The companies then figure out what visitors to their sites like and don’t like and aggregate these visitors with other visitors who share their likes and dislikes, creating what’s termed a “cohort,” Olson said. Based on that, they can figure out what recommendations they make will best suit members of that cohort.
“CEOs should care deeply about this stuff,” Olson suggested, pointing to an IBM study that showed companies which looked at big data showed a 20-fold improvement in earnings between 2004 and 2008.
Large corporations such as Orbitz are increasingly using Cloudera’s Hadoop analytic tool, Olson said.
However, that doesn’t necessarily mean open source tools are the answer to analyzing today’s plethora of data, warned Rob Enderle, principal analyst at the Enderle Group.
“The fact of the matter is that data’s growing faster but analytics tools are also behind,” Enderle told LinuxInsider. “They’re always going to be behind the information they’re analyzing.”
While there’s a broad future for data analytics tools, whether enterprises select open source or proprietary analytic tools will depend on their requirements, Enderle said.
“If your systems are proprietary, then probably your best tools should be proprietary as well,” Enderle explained.
Focus on Data, Not Software
Open source vendors seeking to make money from software are not going to be a happy lot, according to the assessment by Redmonk’s O’Grady.
“Growth from software sales is slowing, but growth through data is not,” O’Grady said.
That’s because we’re in the fourth stage of software production, O’Grady suggested.
The first stage, epitomized by IBM, held that the money was in the hardware and software was just an adjunct, O’Grady said. Stage two, fired off by Microsoft, contended the money is in the software.
Google epitomizes the third stage, where the money is not in the software, but software is a differentiator. “Google came up at a time when a lot of folks were building the Internet on the backs of some very expensive hardware and software,” O’Grady said. “Google uses commodity hardware, free — meaning no-cost — software, and focuses on what it can do better than its competitors with that software.”
Examples of the fourth stage are Facebook and Twitter. “Now, software is not even differentiating; it’s the value of the data,” O’Grady remarked. “Facebook and Twitter monetize their data in different ways.”
Entering the Age of Data
Software is now a means, not an end in and of itself, O’Grady stated.
While open source software vendors are “terrible” at customer conversion, they’re “very good” at distribution because they’re giving away their products, O’Grady said.
“If you’re good at distribution, you’re good at generating data,” O’Grady pointed out. “You’re generating the data, so use the data,” he urged his listeners. “Put it to work converting customers and generating revenue if you’re a vendor.”
Open source “is not growth from the revenue perspective, but it’s phenomenal for growth from the point of view of sharing data,” O’Grady said. “Open source enables data which enables growth.”
For example, Google “took free software and created a product that generates enormous amounts of data,” O’Grady said. “You may not be Google but the data that’s being generated can be used,” he added.
The small vendor has an advantage in terms of focus, Enderle said.
“The bigger you are the harder it is to work out a specific customer need,” Enderle elaborated. “Small vendors need to drill down on the customer set because they’re not going to get the scale to make it up in volume. They need to look at the needs of a distinct customer set.”