Multiple servers may be configured into a system comprising a grouping or “cluster” servers. The cluster of computers may each include a processor and a local main memory. The clusters of computers may be interconnected via a communication network. The cluster of servers may further interface with a distributed data storage facility that stores a set of data that may be accessed by the servers comprising the cluster. The cluster of servers may cooperatively operate to process and execute queries of large datasets (e.g., Peta-bytes of data) such as, for example, databases related to popular social networks. The resources of the machines, including the main memory of the machines in the cluster, may be operated in parallel, in a manner that advantageously harnesses the power of the multiple machines in the cluster.
A concern regarding a cluster of machines operating in parallel and sharing memory is when and where to store the data that will be used in executing tasks by the cluster. A number of variables need to be accounted for in determining when and where to store the data used by the cluster of machines.
In some contexts, such as a cluster of servers, there may exist a desire to determine a schedule of how to execute a query execution plan, including when and where to store and replicate data associated with the query execution plan in an accurate and efficient manner.