MIT Machine Does Big Data Analysis the Human Way

Researchers at theMassachusetts Institute of Technology last week announced they had developed an algorithmic system to analyze big data that eventually might replace humans.

The system, called the “Data Science Machine,” designs the feature set and searches for patterns in big data.

The DSM’s first prototype was 96 percent as accurate as the winning submission by a human team in one competition to find predictive patterns in unfamiliar data sets, MIT said. In two others, it was 94 percent and 87 percent as accurate as the winning submissions.

“In effect, you’re replacing a person — a data scientist in this case — and they’re hard to get and spread very thinly,” said Rob Enderle, principal analyst at theEnderle Group.

“Even 87 percent is better than you’d likely get with someone untrained, and it could be close enough for a data scientist to refine the results, substantially cutting the time required for a project,” he told TechNewsWorld.

How DSM Works

Big data analysis searches for buried patterns and extrapolates from them to make predictions, but researchers first have to decide which features of a database to look at.

DSM aims to automate the selection of the feature set by conducting what’s known as “feature engineering.”

The researchers — graduate student Max Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’sComputer Science and Artificial Intelligence Laboratory — used various techniques in feature engineering.

One is to exploit structural relationships inherent in database design by tracking correlations between data in different tables.

The DSM imports data from one table into a second, looks at their associations, and executes operations to generate feature candidates. As the number of correlations increases, it layers operations on top of each other to discover things such as the minima of averages and the average of sums.

The DSM also looks for categorical data, which appears to be restricted to a limited range of values, such as brand names. It generates further candidates by dividing existing features across categories.

After a number of candidates have been produced, the DSM searches for correlations among them and winnows out those without correlations.

It then tests its reduced set of features on sample data, recombining them in various ways to optimize the accuracy of the resulting predictions.

Learning Deep

“This is really about deep learning — the ability of server platforms to analyze data and develop intelligent algorithms,” remarked Jim McGregor, a principal analyst atTirias Research.

The DSM research “proves the value of the research being done by companies like Google, Baidu, Alibaba and Microsoft, and indicates some of the challenges,” he told TechNewsWorld.

Developing intelligent algorithms “is a science of learning,” McGregor said. “You don’t always get the right answer the first time, but accuracy improves over time and with additional feedback and more data.”

The potential for machine learning and deep learning is unlimited and “will change our industry and society by allowing both machines and humans to be more productive,” he predicted.

Whizzing Through Problems

Teams of humans typically take months to create prediction algorithms, whereas the DSM took two to 12 hours to produce each of its entries, MIT said.

Even though the DSM didn’t do as well as humans in the competitions, its conclusions still can be valuable.

“Think about what it would take to develop a drug for a supervirus. You don’t have months, you have days before a pandemic breaks out,” McGregor pointed out. In such a case, “it’s not about finding the right answers, but finding the potential answers while eliminating many or most of the wrong ones.”

Within a decade, such a system “should be able to compete with humans, matching or exceeding their accuracy if progress continues,” Enderle said.

The risk is that, as we increasingly rely on such automated systems, we may lose the skill sets needed to do the work ourselves and be unable to see the mistakes the systems make, he cautioned.

“A critical flaw in a future system could go unnoticed as a result and lead to catastrophic consequences,” Enderle posited.

Kanter is presenting his paper at the IEEE International Conference on Data Science and Advanced Analytics in Paris this week.

Richard Adhikari has written about high-tech for leading industry publications since the 1990s and wonders where it's all leading to. Will implanted RFID chips in humans be the Mark of the Beast? Will nanotech solve our coming food crisis? Does Sturgeon's Law still hold true? You can connect with Richard on Google+.

1 Comment

  • Better yet, it does a perfect job of finding false patterns, in complex noise, leading to 100% correct conclusions, about imaginary problems, just like real people do!!

    Yeah.. stupidly, people rely on machines to be "right", while human conclusions they may, reasonably questions. So.. we now have something people will, stupidly, trust to be 100% right, because it, 96% of the time, comes to the same wrong conclusions, the validity of which would have been possibly questioned, if it was a bunch of people making it…. Just what we need…

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

More by Richard Adhikari
More in Emerging Tech

TechNewsWorld Channels