A map-reduce framework defines a computational model that can be used to design and implement parallel processing techniques on computer clusters, such as large-scale distributed systems. While the term “MapReduce” is often used to describe such a framework, the term “map-reduce” (without capital letters) is used herein to clarify that reference is not being made to any particular implementation. A map-reduce operation is an operation performed according to the map-reduce framework. A map-reduce operation includes a mapping operation that includes multiple individual operations that can be distributed across multiple computing environments (mappers) and performed in parallel. Each individual mapping operation takes one or more input label-value pairs and outputs one or more intermediate label-value pairs. The map-reduce operation typically also includes a partition operation that takes one or more intermediate pairs and defines a partition of intermediate values. Additionally, the map-reduce operation includes a reduce operation that can include multiple individual reduce operations that can be distributed across multiple computing environments (reducers). The individual reduce operations each take an input label and a partition of intermediate values and outputs an output label-value pair. A map-reduce cluster is a computer cluster that is configured to run map-reduce operations according to the map-reduce framework.
Some computer cluster providers offer access to map-reduce computer clusters “as a service.” This can allow a map-reduce program and associated data pairs to be delegated to a third-party's computer cluster by a data provider's cluster client. The computer cluster can run the map-reduce program on the data in the map-reduce cluster and return results to the data provider. Access to other computing environments (e.g., mainframe computers or computer clusters that are not using the map-reduce framework) may also be offered as a service.