Within the field of computing, many scenarios involve a set of events to be evaluated through data mining. As a first example, the events may comprise the actions of a set of customers interacting with a commercial store, website, product, or service, and the actions may be evaluated to identify consumer trends. As a second example, the events may comprise the actions of individuals comprising a demographic, a group, or an organization, and the actions may be evaluated to identify patterns of behavior of behavior among the individuals. As a third example, the events may comprise the actions of users who wish to receive services and information that may be of interest to the users. As a fourth example, the events may comprise measurements of a system, such as a machine or an environment that are to be evaluated to monitor the state of the system on behalf of an administrator. As a fifth example, the events may comprise measurements performed in a technical or scientific study, and the evaluation may be performed to identify relevant information.
Many such processing systems are centered around a large database and the evaluation of data stored therein. For example, many such processing systems are designed as a server farm, comprising a large number of database servers interoperating as a distributed database, and configured to generate various queries to be applied to a very large data set stored therein. This information may be stored, e.g., as a large set of tables comprising interrelated records, where such tables and records may be distributed across the database servers comprising the server farm. As an example of such large-scale processing, many such data processing systems utilize a MapReduce-based framework, wherein a central coordinating system may evaluate a query by identifying various query components, distributing each query component to a database servers storing information relevant to the query component, and compositing the query results generated by each database server to generate a query response. Such databases are often designed to store a large amount of data gathered over a period of time, and to apply large and complex queries to large numbers of records (potentially comprising billions of records gathered over a long period of time), and the evaluation of such queries may eventually result in a result set comprising portions of the records satisfying the criteria of the query.