Big Data – Innovation Essence

Massachusetts Institute of Technology (MIT) researchers at the Computer Science and Artificial Intelligence Laboratory (CSAIL) are looking to take human intuition out of big data analysis by letting computers choose the feature set used to identify predictive patterns in the data. This effort is called “Data Science Machine”.

Big Data represents a huge, complex ecosystem that brings together innovative processes from across the spectrum of data analysis, storage, networking, curation, search, and many other processes and functions. Much of big data analysis is automated and algorithmic, but in the end data scientists and business users are needed to determine what features of the analysis and data sets are needed for end visualization to communicate that data and make it actionable.

When looking at a huge amount of data, experts often collide over what features of that data are needed to produce results that can lead to action. In that is a constellation of price points, locations, ethnographic information, returns, upgrades, etc. all of which lead to patterns in purchasers and purchases but in the end a human needs to choose what combinations of those data points will come together to tell them what they want to know. “We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

Picking the features needed to reveal the patterns necessary to provide actionable information is often the purview of the big data scientist writing the analysis code. In the end, that code will guide the big data engine in its analysis that will predict or reveal what the humans looking at the data needs. The essence of this is to provide a big data algorithm that doesn’t simply provide answers for a question asked about the data but an algorithm that suggests questions based on the data set. Researchers already intend to use this technology as proof-of-concept for seeking feature sets that will explore such things as the power-generating capacity of wind farms or predicting which students will drop out of online courses. Dropout prediction tends to rise out of two major data points, how long a student waits before a deadline, before working on a problem and how long a student spends on a course relative to her classmates. MIT’s online learning platform MITx does not record these data points, but the galaxy of other data could potentially hold interactions that would allow this information to be inferred. A system such as the Data Science Machine could be used to engineer likely feature sets to deliver that.

Many corporations, institutions, businesses and governments collect a great deal of data already–and often must avoid collecting particular data due to network, storage and sensor constraints but breakthroughs in machine learning and meta-analysis such as the Data Science Machine would augment the already interesting job of being a big data scientist by adding another layer of automation. Big data scientists would still code the analysis portions for the engines to turn over in their computerized brains, they would just have another tool in their belt when addressing the questions that deliver the answers they need.

For more information please visit: www.mit.edu