As Big Data applications areas have grown, the demand for software platforms for performing analytics has increased. As such, multiple vendors provide their own tailored platforms for developers wanting to create Big Data analytics products.
In conventional approaches to Big Data processing, multimodal analytic developers will have to learn and specifically develop for each particular platform. These platforms, including the open source Apache SPARK and Unstructured Information Management Architecture Asynchronous Scaleout (UIMA-AS), will each provide their own interface.
The recent Big Data processing platforms, such as SPARK, have a more flexible programming model than earlier platforms, such as Hadoop, and this flexibility provides a new power in the “application space” to create optimized applications. This means that applications in SPARK can actually rewrite the way data is partitioned, shuffled, or aggregated, which is not necessarily possible in Hadoop MapReduce.
What is needed is a method that takes advantage of a platforms ability to rewrite the way data is partitioned, shuffled, or aggregated and provides for creating efficient applications that consider data and task partition as well as task-to-machine mapping.