In a very general sense, computer implemented processes do nothing more than process input data to produce output data. For example, input data I supplied to a process P produces output data O. In many cases, the same, or almost the same input data are repeatedly processed by the same process. In this case, information about the input data and the process, as well as the output data, can be cached. Then, if the same input data are subsequently recognized, the output data can directly be produced from the cache. This makes sense if the cost of processing the data is large when compared with the cost of caching the data.
For example, a complex software system is typically generated from input data in the form of a large number of source code objects, configuration options, and building instructions. Each source code object may include, perhaps, thousands of lines of source code. Periodically, the source code objects are compiled to generate as output data, object code objects. The object code objects are then linked to generate the machine executable image of the software system. This process is called a system "build." If the system is undergoing active development, a new version of the system may need to be built on a daily, or more frequent basis. Building systems can be time consuming, particularly if the system is generated from many source code objects. Therefore, caching input and output data can reduce the amount of time required to build the system.
In one prior art technique known as memoisation, an operation such as a function application stores all arguments to which a function is applied, together with the results which are obtained. Then, if the function is ever applied again with the same arguments, the result does not need to be recomputed, since the previously cached results can immediately be used. Memoisation is an optimization which can replace a potentially expensive computation by a simple look-up. Since there is no analysis of which specific arguments are used to compute the result, memoisation caches all arguments.
Another prior art scheme analyzes program data dependencies to determine if a result is due to processing a specific set of input data. In this case, only those arguments which contribute to the result are cached. With dependency analysis, the amount of data that need to be cached can be reduced, while increasing the cache hit rate. This is particularly true if the input data are complex.
Most prior art dependency analyses are static and imprecise. Static means that data dependencies of a program are examined prior to executing or interpreting the program. For many data items such as variable arguments or parameters, exact values bound to the arguments are not known until "run" time. Thus, static dependency analysis only allows the recording of imprecise or coarse data dependency information.
If less than the optimal amount of data is cached, then the cache miss rate is increased. Therefore, it is desired to provide a dynamic fine-grained dependency analysis so that the caching of data can be improved. In other words, fine-grained dependency analysis determines, for a particular process, which precise parts of input I are essential to produce output O.