1. Field of Invention
The invention relates to a method of computer memory allocation and more particularly to a scalable method of thread-local object allocation.
2. Background
Memory allocation in concurrent object-oriented programming languages (e.g., Java) and concurrent extensions of sequential programming languages (e.g., C) are traditionally accomplished by either a xe2x80x9cshared-heapxe2x80x9d allocation scheme or a xe2x80x9cthread-localxe2x80x9d allocation scheme. Each has its own distinct advantages and disadvantages.
The shared-heap scheme of memory allocation maintains a general area from which memory is assigned is commonly called the heap. It is a controlled area for the storage of unallocated memory. As the name implies, the heap is shared among the threads. The heap is guarded by mutual exclusion primitives. This is an operation that allows only one thread, or logical unit of application, access to the memory at a time. Mutual exclusion primitives are necessary to ensure stable operations since the heap is a shared resource.
The shared-heap allocation scheme has several adverse impacts on performance, especially in multi-processor applications such as servers. In particular, multi-processor performance or scalability is hurt by needless serialization. Serialization is the requirement that potentially concurrent operations be executed consecutively. Another problem is bus traffic due to cache misses. A cache miss occurs when the hardware cache finds that the data sought is not present and the data must be accessed from the hard disk drive instead. These cache misses are most commonly caused by atomic memory operations and false sharing. A final problem with shared-heap allocation is simply the extra cycles for instructions required by mutual exclusion.
Prior art Java environments (e.g., Sun Microsystems"" Hotspot Performance Engine) focus on optimizing the mutual exclusion mechanisms. While these efforts improve allocation efficiency by reducing the total instruction count, they ignore multi-processor scalability issues. These issues include reduction in bus traffic and co-location of objects in working sets of individual threads.
The thread-local allocation scheme of memory allocation views each allocated memory space as a private thread resource. Because each thread has its own specified memory allocation space or xe2x80x9cheapletxe2x80x9d, mutual exclusion is guaranteed. As a result, xe2x80x9ccache sloshingxe2x80x9d is reduced. This sloshing occurs when the value in a cache register rapidly migrates among different caches in a multi-processor environment. Additionally, xe2x80x9clocalityxe2x80x9d is improved. This is where objects created by the same thread near the same time are located near the same space.
However, the thread-local scheme cannot be used in highly concurrent multi-processor applications without significant difficulties. This scheme makes very high demands on memory allocation space size. For example, an application that creates one thousand threads with 1 Mb memory space allocated per thread would need 1 Gb of space for the entire memory pool. Also, memory fragmentation and frequent xe2x80x9cgarbage collectionsxe2x80x9d are substantial problems. Garbage collection or GC is the creation and resizing of allocation spaces. It is an automatic function of the processor that frees the programmer from explicitly performing these tasks. Common GC implementations require all threads to be stopped (known as xe2x80x9cstopping the worldxe2x80x9d) before proceeding. The increased number private heaplets will have a corresponding increase in the time that all threads are stopped.
In some aspects the invention relates to a method for allocating memory comprising: maintaining a memory pool comprising a plurality of memory spaces; allocating a memory space to a thread when the thread transitions to runnable state; and allocating the memory space to the memory pool when the thread transitions to a blocked state.
In an alternative embodiment, the invention relates to an apparatus for allocating memory comprising: a microprocessor; and a program executable on the microprocessor that maintains a memory pool comprising a plurality of memory spaces, allocates a memory space to a thread when the thread transitions to runnable state, and allocates the memory space to the memory pool when the thread transitions to a blocked state.
In an alternative embodiment, the invention relates to an apparatus for allocating memory comprising: means for maintaining a plurality of memory spaces; means for allocating a memory space to a thread when the thread transitions to runnable state, and means for de-allocating the memory space to a thread when the thread transitions to a blocked state.
The advantages of the invention include, at least: reduced demands on memory pool allocation space; decreased memory fragmentation; and decreased frequency of garbage collection operations. These advantages result in faster and more stable operations, especially in highly concurrent operations in multi-processor environments.