1. Field of the Invention
This invention is related to dynamic memory allocation for computer systems.
2. Description of the Related Art
Modern microprocessors are demanding increasing memory bandwidth to support the increased performance achievable by the microprocessors. Increasing clock frequencies (i.e. shortening clock cycles) employed by the microprocessors allow for more data and instructions to be processed per second, thereby increasing bandwidth requirements. Furthermore, modern microprocessor microarchitectures are improving the efficiency at which the microprocessor can process data and instructions. Bandwidth requirements are increased even further due to the improved processing efficiency.
Computer systems typically have a relatively large, relatively slow main memory. Typically, multiple dynamic random access memory (DRAM) modules comprise the main memory system. The large main memory provides storage for a large number of instructions and/or a large amount of data for use by the microprocessor, providing faster access to the instructions and/or data then may be achieved from a disk storage, for example. However, the access times of modern DRAMs are significantly longer than the clock cycle length of modern microprocessors. The memory access time for each set of bytes being transferred to the microprocessor is therefore long. Accordingly, the main memory system is not a low latency system. Microprocessor performance may suffer due to the latency of the memory system.
In order to increase performance, microprocessors may employ prefetching to "guess" which data will be requested in the future by the program being executed. If the guess is correct, the delay of fetching the data from memory has already occurred when the data is requested (i.e. the requested data may be available within the microprocessor). In other words, the effective latency of the data is reduced. The microprocessor may employ a cache, for example, and the data may be prefetched from memory into the cache. The term prefetch, as used herein, refers to transferring data into a microprocessor (or cache memory attached to the microprocessor) prior to a request for the data being generated via execution of an instruction within the microprocessor. Generally, prefetch algorithms are based upon the pattern of accesses which have been performed in response to the program being executed. A popular data prefetch algorithm is the stride-based prefetch algorithm in which the difference between the addresses of consecutive accesses (the "stride") is added to subsequent access addresses to generate a prefetch address.
Stride-based prefetch algorithms often work well with statically allocated data structures. Data structures are statically allocated if they are allocated memory at the initiation of a program and remain allocated in that same memory throughout execution of the program. Because the data structure is statically allocated, it is generally laid out in contiguous memory locations. Stride-based prefetch algorithms work well because the memory storing the data structure is contiguous and the reference patterns are regular. A statically allocated array, for example, may be traversed by reading memory locations which are separated from each other by a regular interval. After just a few memory fetches, the stride-based prefetch algorithm may have learned the regular interval and may correctly predict subsequent memory fetches.
On the other hand, data structures are dynamically allocated if the memory for the data structures is allocated and deallocated as needed during the execution of the program. Dynamically allocated data structures have a variety of advantages in programs in which the amount of memory needed for the data structure varies widely and is difficult or impossible to predict ahead of time. Instead of statically allocating a very large amount of memory, the memory is allocated as needed. Memory space is thereby conserved.
Unfortunately, dynamic memory allocation algorithms are typically not conducive to prefetch algorithms. Dynamic memory allocation algorithms typically employ a "first fit" approach in which the first available memory block which includes at least the number of bytes requested for allocation is selected, or a "best fit" approach in which the available memory is scanned for a memory block which is closest in size to the requested number of bytes or causes the least amount of fragmentation if allocated to the request. These approaches select memory locations which may have no logical relation to other memory locations allocated to the data structure. Therefore, traversing the data structure generally does not involve regular intervals between the elements. A stride-based prefetch algorithm would have a low likelihood of prefetching the correct memory locations for such a dynamically allocated data structure. Other prefetch algorithms have similar difficulties, as the pattern of accesses is ill-defined. As used herein, a "memory block" comprises one or more contiguous bytes of memory allocated in response to a dynamic memory allocation request.