The present application relates to computing and more specifically to software and associated systems and methods for facilitating selectively retrieving and processing data in a networked computing environment.
Software for facilitating retrieving and processing data is employed in various demanding applications, including big data computing applications, enterprise cloud services, scientific research, and so on. Such applications often demand efficient mechanisms for enabling selective extraction of data from among plural computing resources of a network, and for processing the extracted data.
Efficient mechanisms for selectively extracting and processing data are particularly important in networked enterprise computing environments, which may involve data distributed among thousands of servers, and may further involve running several parallel processes to extract and process the data. Hand coding software to perform custom data extractions and processing can be prohibitively costly and time consuming.
To address this issue, MapReduce frameworks installed on servers of a networked enterprise computing environment may facilitate performing data extractions and processing. An example MapReduce framework includes a mapper that extracts data in accordance with an input script, called the MapReduce job configuration. The extracted data may be distributed among servers of a network, and the extracted data or copies thereof may be shuffled or selectively distributed among network servers. Subsequently, a reducer performs processing on the extracted data. The processing may occur in parallel among different servers of the network.
However, conventionally, MapReduce jobs are hand coded via a scripting language, such as Java or Python. However, such hand coding of MapReduce jobs, which may include writing map functions and reduce functions, remains costly, time consuming, and error prone. The jobs must often be written on a case by case basis, and may not be applicable to operating on different types of payload data retrieved by a mapper, e.g., eXtensible Markup Language (XML) data, JavaScript Object Notation (JSON) data, and so on.