This invention relates generally to memory management in run-time environments, and more specifically to a garbage collection algorithm that uses cache memory features to reduce garbage collection time.
The random access memory (RAM) of a computing system is a fixed size resource; currently a RAM size of 32 megabytes (Mb) is typical. The RAM must be managed properly to maintain system performance. In run-time environments such as Java or Microsoft CLI, memory management is handled by the system. Memory management includes a process known as xe2x80x9cgarbage collectionxe2x80x9d. Garbage collection is a process with the aim of being as unobtrusive as possible in recycling memory. When a computer program is running it allocates and uses portions of memory on an ongoing basis. At some point the program may no longer need to use a particular portion of memory, e.g., the memory was allocated for a particular purpose that is no longer relevant. The portions that are no longer being used (garbage) are identified (collected) so that they can be reclaimed for future allocation. The garbage collection process taxes the central processing unit (CPU) and degrades system performance as perceived by the application. It is, therefore, highly desirable to reduce the time taken to reclaim unused portions of memory.
Typical computing systems have a cache memory between the CPU and main memory. The cache is small, typically 2 Mb or less, compared to main memory, that is typically 128 Mb. The cache is used to store, and provide fast access to data from the most recently used memory locations. The cache is also much faster than main memory. This allows data stored to cache to be provided more quickly. Most modern computing systems have software mechanisms (CPU instructions) to explicitly manage the cache.
Garbage collection takes place in main memory, but because the cache acts as a bottleneck to the processing of memory data, the way cache is managed is important to reducing garbage collection time. Typically objects being evacuated from old space are in linked data structures, which do not exhibit locality that can be exploited using cache-enhancing instructions such as xe2x80x9cprefetchxe2x80x9d or automatic prefetch hardware. This results in poor cache performance as described below.
A popular garbage collection algorithm for use in run-time environments is the moving garbage collection algorithm (MGCA). The MGCA examines a memory block that may typically be from 1 Mb to 4 gigabytes (Gb) in size. The MGCA determines which memory data from the block is in use (live data) and which is garbage. As the name implies, MGCAs move all live data to new consecutive memory locations. This compacts the live data into a smaller space than when it was co-located with the garbage. Once the live data is copied to new locations the entire block can be reclaimed and reallocated.
A typical MGCA has three phases: mark, repoint, and copy. In the mark phase the live objects, those to be moved to a new memory location, are determined. At this point new memory locations for the data objects are also determined. In the repoint phase the live objects are examined and their references are changed so that they refer to new memory locations. In the copy phase, the contents of each live object are copied to the new memory location. Many MGCAs implement a Cheney Scanning algorithm. The Cheney Scanning algorithm identifies a memory location to which live data may be copied (TO space), and then determines a xe2x80x9croot setxe2x80x9d of live data in FROM space. The root set typically includes references in CPU registers, thread stacks, and globally accessible locations. The root set is then copied to new memory locations in TO space. Once copied, each object is scanned iteratively to determine associated memory data (i.e., reachable objects). If a reachable object is determined a new memory location is allocated and the reachable object is copied to the new memory location in TO space. The copied associated object will likewise, in turn, be scanned to determine associated memory data. This process is continued until the transitive closure of the root set is reached (i.e., all reachable objects have been identified and copied). In the Cheney Scanning algorithm the FROM space copies of live objects go through the cache, at least once, when they are being identified and copied to TO space. The TO space copies of live objects go through the cache twice, once when the object is copied to TO space, and once when the object is scanned to determine associated objects. The design and structure of the Cheney Scanning algorithm precludes the maximal use of cache prefetch features to reduce the garbage collection time.