For analytics frameworks, Big Data is a broad term referring to data sets where traditional data processing is inadequate—typically because the data sets are extremely large or complex. Big Data challenges include search, sharing, transfer, visualization, privacy, curation of data, analysis, and the like. Three core elements exist in advanced analytics frameworks: data, algorithms, and computation platforms.
In the modern digital world, data volumes have been growing in size and complexity at an exponential rate. In addition, the growing variety, velocity and veracity of the data also contribute to the complexity of a Big Data world.
Algorithms used for analysis of the data and for data mining tasks are, however, more or less the same as those used in past years. As such, apart from custom implementations, the algorithms being employed for Big Data may not significantly differ from what has been traditionally available.
Computational platforms have seen a number of significant innovations in past years both from the hardware as well architectural design perspectives. This has made it possible to harvest intensive data volumes by introducing parallelization of tasks (e.g., Map-Reduce frameworks and the like). Such frameworks are designed to cope with high dimensional data that are used as the input of analytical platforms and tools for extracting data. At a very high level, these platforms try to address two correlated issues, feasibility of data processing due to hardware limitations, and unreasonable execution time. One major challenge in harvesting the benefits from Map-Reduce like solutions is to convert the computation task into parallel executable sub-tasks (mapping) and to combine the results later on (reducing). While being suitable for a family of computations (such as embarrassingly parallel tasks), there are scenarios where transforming the computation task into a format suitable for Map-Reduce either requires significant modifications and efforts or is impossible (e.g., most of the holistic algorithms where full data view is required).
It is desirable to have techniques to handle the large amount of data using existing or emerging data mining algorithms while leveraging the advanced computational platform, for example, by parallel processing, to provide “Big Data Technology Solutions.”