Data, such as software programs, information, or other forms of data, has become a resource and asset for many individuals and businesses. A data grid can be a distributed database system that can store and manage data across multiple nodes, particularly when the amount of data is relatively large. For example, the data grid can be a collection of nodes (e.g., a node cluster) with an increased computing power and storage capacity. Data grids can provide functionalities such as querying or searching, processing for streaming data, and transaction capabilities.
Traditionally, when performing a functionality such as a query or search, the size of the data grid is limited to a memory capacity of one or more nodes in the data grid. Traditionally, to search data in a data grid, a map/reduce application may be used. A map/reduce application can provide distributed processing for large data sets in a data grid. The map/reduce application can include two phases, a map phase and a reduce phase. In the map phase, a master node can initiate a task such as a query, and divide the task between multiple nodes, e.g., communicate map tasks to different nodes in the data grid. The nodes that receive the map tasks can execute the map task and return results back to the master node, e.g., the nodes in the data grid can search for data in the memories of the nodes and communicate data matching search criteria back to the master node.
When the master node receives the data, the master node can then implement the reduce phase. In the reduce phase, the master node can aggregate the received data and remove duplicate data, e.g., reduce the data results. When the master node has completed the reduce phase, the master node can communicate the reduced data to the querying application. While the map/reduce application can be used to query data in a data grid, the scalability of the data grid can be limited. For example, where the nodes execute the map tasks and return the results back to the master node, the amount of data stored in the nodes of the data grid cannot exceed the memory capacity of the master node, otherwise an out of memory (OOM) condition can occur when the memory of the master node is filled and the nodes continue to send result data to the master node.