1. Field
This application relates generally to distributed database systems, and more specifically to a system and method of MapReduce implementations on indexed datasets in a distributed database environment.
2. Related Art
MapReduce is a programming model and an associated implementation for processing and generating large data sets. In one example, users can specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Programs written in the MapReduce style can be parallelized and executed on a large cluster of distributed database machines. Typically, a run-time system can manage the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
However, current implementations of MapReduce may be performed over large data sets. For example, Hadoop tends to perform MapReduce over the entirety of a data store. The size of these data sets may result in higher latencies such that MapReduce may not be available in substantially real-time operations.