1. Field of the Invention
The present invention relates generally to an improved data processing system, and in particular to a computer implemented method, an apparatus and a computer program product for cache line reservations.
2. Description of the Related Art
It is common in programming models like Java to instantiate an object and invoke method(s) on the object in order to perform even a relatively simple computational task. Thus, in order to complete complex transactions, modern server and middleware applications typically create a very large number of objects, many of which are only used for a short duration. For example, a short lived object may be an object instantiated to hold a transaction or operation value temporarily in the form of an intermediate result of a calculation. After the completion of a calculation step the value is no longer needed and discarded. Such objects may be created and discarded many times during a session. In many cases, the majority of objects are short lived objects.
Locality of reference is a principle in which computer programs usually and repeatedly access data related either spatially or temporally. In other words, if the program accesses a certain memory location M, it can be expected that the same program would access some other memory location close to memory location M soon. The probability of a certain memory location being accessed several times in a relatively short duration increases if the memory location has been accessed before.
A processor cache is used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. When the processor wishes to read or write a location in main memory, it first checks whether that memory location is in the cache. This is accomplished by comparing the address of the memory location to all the locations in the cache that might contain that address. If the processor finds that the memory location is in the cache, this is referred to as a cache hit and if it could not find it in the cache, it is called a cache miss. In the case of a cache hit, the processor immediately reads or writes the data in the cache line. If a program behaves in accordance with the locality of reference principle, most memory accesses would be to cached memory locations and so the average latency of memory accesses will be closer to the cache latency than to the latency of main memory.
Addresses in both kinds of memory, main and cache, can be considered to be divided into cache lines. A cache line refers to a contiguous range of addresses where the size of the range varies on different computer architectures, for example from 8 bytes to 512 bytes. The size of the cache line is typically larger than the size of the usual access requested by a central processing unit (CPU) instruction, which usually ranges from 1 to 16 bytes.
When a memory access is to a location that is not found in the cache, the entire cache line that the location belongs to is read from main memory and brought to the cache memory. The previous data that was in the cache line in the cache memory is evicted from the cache and so future accesses to that data would have to access main memory.
A cache line replacement policy decides where in the cache a copy of a particular entry of main memory will be placed. If the replacement policy is free to choose any entry location in the cache to hold the copy, the cache is referred to as fully associative. At the other extreme, if each entry in main memory can be mapped into just one respective place in the cache, the cache is then referred to as being direct mapped. Many caches implement a compromise, wherein the compromise is described as set associative.
For example, an N-way set associative defines that any particular location in main memory can be cached in one of N entries in the cache memory. The simplest and most commonly used scheme to decide the mapping of a memory location to cache location(s) is to use the least significant bits of the memory location's address as the index for the cache memory, and to have N entries for each cache location.
In programs that create a large number of objects, and thereby a large working set, performance can be highly dependent on the cost of accessing memory. Modern Java Virtual Machines (JVM) employ sophisticated memory allocation and management techniques to increase data locality by laying out objects in memory, such that cache misses are reduced, thereby ensuring data being accessed is available in cache memory most of the time.
Memory allocation is usually performed by native code generated on the fly by just-in-time (JIT) compilers, whereas memory management is handled by the garbage collector (GC). Previous efforts in reducing the overhead of object allocation were directed toward use of a specific thread-local heap (TLH) when allocating objects from a specific thread. Use of the thread local heap was primarily aimed at eliminating the need for synchronization at every allocation in the presence of multiple threads, as would be the case if there was one heap allocation area for all threads, by assigning a chunk of memory for exclusive use by a thread. Allocation of selected objects from a different portion of the memory has also been tried. For example, the partitioning of the entire heap into multiple heaps, where each of the heaps was used for some selected objects.
Many efforts are directed towards improving object layout through use of the garbage collector to move related objects closer together to improve object locality. Several schemes have been proposed that ensure that objects accessed within a short duration of each other are laid out as close as possible in memory. The garbage collector changes the layout of objects to improve locality in a separate phase but does not affect how objects are allocated initially. Garbage collectors usually perform work in cycles whereas the time between cycles, can and usually does, allow for many allocations.
Some of these allocations might result in short lived objects that are discarded before the next garbage collection cycle, therefore it would be too late for garbage collection to do anything meaningful with the expired object apart from reclaiming the memory, which would not affect cache locality. Therefore there is a need to reduce cache misses; by ensuring data being accessed is available in cache memory most of the time.