Big data processing platforms, operating on a cluster of computing nodes, are becoming increasingly popular as a tool for solving analytics-related challenges. However, these platforms generally require learning the specific platform architecture and writing specific code for that platform when developing a big data application. For example, the Apache SPARK and Unstructured Information Management Architecture Asynchronous Scaleout (UIMA-AS) are popular platforms for big data analytics with a large, growing ecosystems. SPARK, in particular, provides a scalable, fault-tolerant, distributed backend for analyzing large datasets in a scale-out cluster. However, SPARK is oriented towards analyzing text, has no built-in support for Matlab or legacy code, and requires learning the platform architecture and APIs to write specific code for that environment.
In conventional approaches to big data processing, multimodal analytic developers have to learn and specifically develop for each particular platform. These platforms, including the open source Apache SPARK and Unstructured Information Management Architecture Asynchronous Scaleout (UIMA-AS), each provide their own interface. The recent big data processing platforms, such as SPARK, have a more flexible programming model than earlier platforms, such as Hadoop, and this flexibility provides a new power in the “application space” to create optimized applications. This means that applications in SPARK can actually rewrite the way data is partitioned, shuffled, or aggregated, which is not necessarily possible in Hadoop MapReduce. However, significant challenges remain in adapting to these flexible programming models.
What is needed is a system and method to provide multi modal analytics developers and users to seamlessly use big data processing platforms such as SPARK or UIMA-AS without the need to learn their architecture and API.