Cloud computing involves many powerful technologies, including map-reduce applications, that allow large online companies to process vast amounts of data in a short period of time. Tasks such as analyzing traffic, extracting knowledge from social media properties or computing new features for a search index are complex by nature and recur on a regular basis. Map-reduce applications are often used to perform these tasks to process large quantities of data. A map-reduce application may be executed in a map-reduce framework of a distributed computer system where input data is divided and loaded for processing by several mappers, each executing on mapper servers, and partial results from processing by mappers are sent for integration to one or more reducers, each executing on reducer servers. In the domain of research and development, a flexible environment is needed to quickly experiment with different configurations for map-reduce applications.
Unfortunately, usage of these technologies requires a technical expertise that, in many cases, constitutes a barrier to entry. For example, Hadoop is an open source Java implementation of a map-reduce framework with an infrastructure that includes a Hadoop core or map-reduce library to support distributing map-reduce applications over multiple machines. Hadoop has quite a steep learning curve, requiring a developer to become familiar with several technologies within the Hadoop framework such as a data serialization system, a data collection system, a distributed file system, a data warehouse infrastructure, and a high-level data-flow language and execution framework for parallel computation. Additionally, a developer must learn to program data analysis applications in the programming model for processing large data sets including specifying map functions that process an input set of key/value pairs to generate a set of intermediate key/value pairs, and reduce functions that merge intermediate values associated with the same intermediate key into an output set of key/value pairs.
What is needed is a way for a developer to focus on programming data analysis applications in a map-reduce programming model without needing to become familiar with the technical details of several technologies within the Hadoop framework. Such a system and method should allow for easily chaining and parallelizing tasks of a map-reduce application in a map-reduce framework.