This invention relates to the field of data retrieval, and in particular data retrieval using an intermediate storage mechanism, such as a cache, within a data processing system capable of performing iterative processes.
Modern data processing systems need to be able to access data very quickly if they are to operate at the speeds dictated by many of today's applications. As the amount of data being stored has increased, much research has been carried out in order to find quicker methods of accessing such data. It has been found that in many applications an intermediate store mechanism, called a `cache`, can provide a very effective means for decreasing the amount of time associated with accessing data.
U.S. Pat. No. 4,426,682 describes a data processing system which employs a cache memory to reduce reference to main memory and hence expedite processing execution. The problem which it seeks to overcome is that the data stored in the cache is subject to frequent modification either by the same or by a different user process. Hence the data in the cache for one process may be invalid for another user process. The patent describes an improved technique for flushing the cache whereby all present data in the cache is made invalid without having to reset each memory location in the cache, hence saving processing time. This patent is illustrative of the general desire to reduce processing time expended in retrieving data whenever possible.
A large amount of research into caching techniques has resulted in various different types of caches being developed, one popular type being the `least recently used` (LRU) type of cache. A typical data processing system will include a large main memory in addition to such a cache, the cache typically being much smaller than the main memory. When a particular process requires a piece of data a search of the cache is initially made to see if that data is already in the cache. Due to the relative sizes, it is much quicker to search through the cache than to search through main memory. Only if the data is not in the cache is the data retrieved from main memory. Since this is a comparatively lengthy procedure the data is copied into the cache in case it is needed again. If the cache is already full then a particular piece of data will need to be erased from the cache to make room for the new data. In a LRU type of cache the piece of data erased from the cache will be that data which has been used least recently.
It has been found that the same piece of information is often used many times in succession, and indeed that some very lengthy programs often only need to access a small number of data entries. In such situations a cache such as described above can make a large difference in the speed of operation since it is much quicker for the system to look in a small cache for a piece of information rather than searching through a large data base.
One particular environment where caches have been put to good use is the message based environment employed by Object Orientated Programming (0OP) techniques. OOP is a particular approach to software development which implements required functions by way of `messages` sent to `objects`. Caching can be particularly valuable for small tight loops with a small number of messages cached.
An `object` is a software package that contains a collection of related procedures (hereafter called `methods`) and data (hereafter referred to as `variables`). Further objects can be grouped into `Object Classes`. An Object Class is a template for defining the methods and variables for a particular type of object. All objects of a given class are identical in form and behavior but contain different data in their variables.
A `message` is a signal sent from one object to another to request the receiving object to carry out one of its methods. Hence a message sent to an object will cause a method to be invoked to implement the required function.
There are two particular ways in which messages can be resolved into invoked functions, that is either at compile time or at run time. There is a trade off between performance and flexibility depending on the choice taken. If messages are resolved at compile time they will be turned into a function call by the compiler and will hence perform well. However in order for this to take place the type of the object and the message must be known at compile time. If the messages are resolved at run time, then the message and object are used to look in some internal tables in order to determine which function to call. This latter approach to function resolution is known in the art as `late` or `dynamic` binding.
Dynamic binding is a much more flexible approach to function resolution than the `compile time` approach, because it reduces the dependencies between code `modules`. Code modules are components of programs, each of which contains its own procedures and data; generally the different modules of a program are kept as independent as possible from each other. With dynamic binding, if an object type used by one module is changed this will not require all connected modules to be recompiled, since the function resolution does not take place until run-time. However dynamic binding is detrimental to overall performance since the table lookup adds a significant overhead to every function call. It is generally considered to be a very slow form of function resolution, and this becomes particularly apparent when an iterative process is taking place. Programming logic frequently dictates that loops are used where a process is repeated multiple times until a particular condition is met.
It is known for OOP systems to include optimization techniques in an attempt to reduce the table lookup times when a message is sent to an object. One such technique involves the use of a HASHING algorithm, while an alternative approach is to cache the last `n` messages that are sent to objects of a particular object class.
Both of these techniques are valuable but have a cost/benefit trade off. HASHING is well known in the art and so will not be discussed further here. As an illustration of the caching approach consider the situation in which the last five messages sent to objects of a particular object class are cached. If a different message is then sent, five incorrect entries in the cache need to be checked before a table lookup is initiated. Obviously as the number of messages cached increases so does the overhead involved in checking incorrect entries, until a point is reached where the cache is the same size as the original table, at which point no benefit is gained. On the other hand if the cache size is decreased the chance of achieving a successful match of a particular message with one already stored decreases, which also reduces the benefit. Further, caching could be performed on the last `n` messages regardless of class. Again a judgement on trade-off needs to be made to decide on a suitable cache size. The problem of trying to find a suitable cache size and method of operating the cache is the subject of numerous papers. In some cases the introduction of a poor cache size and/or operating method can reduce performance over an equivalent system having no caching.