This specification relates to parallel processing.
Cloud computing refers to network-based computing in which collections of servers housed in data centers or “server farms” provide computational resources and data storage as needed to remote end users. Some cloud computing services provide access to software applications such as word processors and other commonly used applications to end users who interface with the applications through web browsers or other client-side software. Users' electronic data files are usually stored in the server farm rather than on the users' computing devices. Maintaining software applications and user data on a server farm simplifies management of end user computing devices. Some cloud computing services allow end users to execute software applications in virtual machines. In a public cloud computing environment, multiple users are able to launch virtual machines (VMs), and each VM launched by a user is included in a cluster of other VMs launched by the user.
MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computers. Processing in map reduce frameworks consists of a series of steps including map, shuffle, and reduce.