Data processing may be performed using a distributed processing system equipped with multiple nodes (for example, information processing apparatuses such as computers) connected to a network. The data processing may be speeded up by dividing data, assigning the divided data to the multiple nodes, and using the nodes in parallel. Such parallelization of data processing is performed in order to process a large amount of data, for example, to analyze access log data which indicates accesses made to a server apparatus. In order to support creating a program for performing parallel data processing, frameworks such as MapReduce have been proposed. A data processing method defined in MapReduce includes a Map phase and a Reduce phase. In the Map phase, input data is divided into pieces, which are then processed using multiple nodes. In the Reduce phase, the results of the Map phase are aggregated using one or more nodes according to keys or the like. The results of the Reduce phase may be passed on to the next Map phase. The framework may be caused to automatically execute the data division and aggregation.
Reuse of past processing results has been considered in order to further speed up data processing. For example, a technology has been proposed in which reuse of results of a Reduce phase is made possible by classifying pieces of data into multiple groups based on update frequency of each data piece, calculating data update frequency with respect to each of the groups, and caching results of the Reduce phase, which results are obtained from data pieces belonging to groups with low update frequency. Another proposed technology is that, if input search criteria are the same as those in a previous search in a document management system, a search is made only with respect to documents whose registration time or update time is later than the time of the previous search and, then, the result of the previous search is added to the result of the current search.    Japanese Laid-open Patent Publication No. 2010-92222    Japanese Laid-open Patent Publication No. 2002-259443
A distributed processing system is considered which performs a first process on an input data set using multiple nodes and, then, aggregates the results of the first process in a second process. In the case of performing data processing on an input data set in this distributed processing system, if data processing was performed in the past on another input data set which includes an overlapping part with the input data set to be currently processed, it is preferable to be able to reuse results of the past data processing. However, if there is a difference between the current input data set and the past input data set, it may be difficult to reuse the results of the past data processing.