1. Field of Invention
The present invention relates generally to memory allocation in computer systems. More particularly, the present invention relates to efficient, low-overhead memory allocation in multi-threaded, object-based computer systems.
2. Description of the Related Art
As the use of virtual machines in computer technology increases, improving the overall efficiency of a virtual machine is becoming more important. The amount of memory associated with a computer system that includes a virtual machine is typically limited. As such, memory must generally be conserved and recycled. Many computer programming languages enable software developers to dynamically allocate memory within a computer system, while other programming languages require explicit manual deallocation of previously allocated memory, which deallocation may be complicated and prone to error. Languages that require explicit manual memory management include the C and C++ programming languages. Other programming languages utilize automatic storage-reclamation to reclaim memory that is no longer necessary to ensure the proper operation of computer programs that allocate memory from the reclamation system. Such automatic storage-reclamation systems reclaim memory without explicit instructions or calls from computer programs which were previously utilizing the memory.
In object-oriented or object-based systems, the typical unit of memory allocation is commonly referred to as an object or a memory object, as will be appreciated by those skilled in the art. Objects that are in use are generally referred to as xe2x80x9clivexe2x80x9d objects, whereas objects that are no longer needed to correctly execute computer programs are typically referred to a xe2x80x9cgarbagexe2x80x9d objects. The act of reclaiming garbage objects is commonly referred to as garbage collection, and an automatic storage-reclamation system is often referred to as a garbage collector. Computer programs written in languages such as the Java(trademark) programming language (developed by Sun Microsystems, Inc.) and the Smalltalk programming language use garbage collection to automatically manage memory.
The use of a compacting garbage collector generally allows objects to be allocated relatively quickly. That is, one advantage of using a compacting garbage collector is fast allocation of objects. Objects may be allocated in a contiguous memory area, e.g., an allocation area, such that the allocation of the objects may be performed by incrementing an allocation pointer by the desired amount of storage. When the end of the allocation area has been reached, a garbage collection may be performed.
One garbage collection method is a generational garbage collection method. A generational garbage collection method is a method in which objects are separated based upon their lifetimes as measured from the time the objects were created. Generational garbage collection is described in more detail in Garbage Collection: Algorithms for Automatic Dynamic Memory Management by Richard Jones and Rafael Lins (John Wiley and Sons Ltd., 1996), which is incorporated herein by reference in its entirety. xe2x80x9cYoungerxe2x80x9d objects have been observed as being more likely to become garbage than xe2x80x9colderxe2x80x9d objects. As such, generational garbage collection may be used to increase the overall efficiency of memory reclamation.
In a system that uses generational garbage collection, a special memory area is designated for the allocation of new objects. Such a memory area is generally considered to be a xe2x80x9cnursery,xe2x80x9d as new objects are allocated within the memory area. As will be appreciated by those skilled in the art, the memory area is often referred to as xe2x80x9cEden.xe2x80x9d
FIG. 1a is a diagrammatic representation of a single thread and a memory allocation area that is dedicated to the single thread. Such a memory allocation area is suitable for implementation within a single-threaded system that uses generational garbage collection. As shown, a memory allocation area 102, which may be known as Eden, is indexed by an allocation pointer 104. In general, Eden 102 is a block of memory in which new objects may be created. When a thread 106, which is associated with Eden 102, attempts to allocate a new object, allocation pointer 104 is typically incremented by the size of the new object, and a check is made to determine if allocation pointer 104 has reached the end of Eden 102. When it is determined that the end of Eden 102 has been reached, a generational garbage collection may be performed to effectively empty Eden 102, thereby allowing new objects to be created by thread 106 within Eden 102.
While the allocation of memory and, hence, new objects, as described with respect to FIG. 1a is effective in a single-threaded system, such an allocation of memory and objects generally may not be used in a multi-threaded system with multiple central processing units (CPUs). By way of example, when two threads concurrently attempt to request space in a single Eden, concurrency problems may arise. As such, in a multi-threaded system, when Eden is a shared resource, access to Eden must generally be synchronized in order to prevent more than one thread from allocating in Eden at any given time. Synchronizing access to Eden may involve associating an allocation lock with Eden that is obtained by a thread when the thread wishes to create a new object, and released by the thread after the new object has been created.
FIG. 1b is a diagrammatic representation of two threads and a memory allocation area shared by the two threads within an overall multi-threaded system. An Eden 112 has an associated allocation pointer 114 which is arranged to indicate the beginning of an unused portion 115 of Eden 112. When threads 116 and 118, which share Eden 112, wish to allocate a new object in Eden 112, they must generally obtain the allocation lock (not shown) associated with Eden 112. Specifically, if thread 116 wishes to access unused portion 115, thread 116 must obtain the allocation lock on Eden 112. Once thread 116 obtains the allocation lock, and it is determined that the end of Eden 112 has not been reached, allocation pointer 114 may be incremented, and a new object may be allocated by thread 116. If the end of Eden 112 has been reached, i.e., when unused portion 115 is null, a garbage collection may be performed to effectively empty Eden 112, thereby allowing new objects to be created by threads 116 and 118.
When access to Eden is synchronized, the allocation of new objects within Eden is typically slowed considerably due to the overhead associated with the acquisition of and the releasing of the allocation lock associated with Eden. Each time a thread wishes to create a new object in Eden, the thread must acquire exclusive rights to Eden, as for example by acquiring an allocation lock. In general, even so-called xe2x80x9cfastxe2x80x9d locking primitives which are directly implemented by hardware, e.g., a compare-and-swap primitive, may be relatively slow when compared to the base costs associated with allocation. For instance, on a multiprocessor system, a locking primitive may incur a remote cache miss, as will be appreciated by those skilled in the art. In such a system, adding synchronization features often significantly increases the cost of allocation, e.g., by a factor of two or three. Hence, adding synchronization during allocation greatly affects the performance of the overall system.
In order to improve performance associated with accessing Eden in a multi-threaded system by avoiding synchronization, each thread in the multi-threaded system may be assigned its own Eden. That is, when each thread has its own Eden, concurrency problems that may arise when more than one thread attempts to access a shared Eden may be avoided. FIG. 2a is a diagrammatic representation of two threads with their own associated Edens, or memory allocation areas. Within a multi-threaded system 200, a first Eden 202, which is referenced by an allocation pointer 204, is associated with a first thread 206. Multi-threaded system 200 also includes a second Eden 212 that is referenced by an allocation pointer 204, and is associated with a second thread 216.
When first thread 206 wishes to allocate a new object, first thread 206 accesses first Eden 202. Similarly, when second thread 216 wishes to allocate a new object, second thread 216 accesses second Eden 212. As each thread 206, 216 has its own exclusive Eden, namely Edens 202 and 212, respectively, no allocation locks are needed to safeguard against two threads attempting to access a single Eden in order to create a new object at any given time.
Although allocating a separate Eden to each thread in a multi-threaded system may eliminate the need for allocation locks, allocating separate Edens often requires a substantial amount of memory. For example, some applications may contain hundreds or even thousands of threads. In addition, some threads may allocate objects at a faster speed than others and, hence, will generally require more memory. The requirement for more memory may lead to frequent garbage collections, performed over all memory, e.g., global garbage collections performed on all Edens, which would require some form of synchronization. As such, overall overhead associated with performing garbage collections on multiple Edens may increase and adversely affect the performance of the overall system, since some Edens may still be relatively empty while others are filled to capacity.
The use of a substantial amount of memory, as well as the increase in the overall overhead associated with garbage collection, that is associated with allocating a separate Eden to each thread in a multi-threaded system may be inefficient and expensive. Reducing the amount of memory used, as well as the frequency of garbage collection, increases the efficiency and generally decreases the costs associated with a multi-threaded system. Dividing an Eden into chunks, or blocks, typically allows an Eden to be shared without requiring allocation locks. The general division of Eden into chunks is described in xe2x80x9cMultilisp: A Language for Concurrent Symbolic Computationxe2x80x9d by R. Halstead, Jr. (ACM Transactions on Programming Languages and Systems, 7(4):501-538, October 1985), which is incorporated herein by reference in its entirety. FIG. 2b is a diagrammatic representation of two threads and a memory allocation area shared by the two threads in which the memory allocation area is divided into chunks. A multi-threaded system 230 includes an Eden 232 that is divided into chunks 233 which are of a consistent size. In other words, all chunks 233 are approximately the same size. Each thread 236, 238 which shares Eden 232 is allocated an initial chunk. By way of example, thread 236 is initially allocated chunk 233a, while thread 238 is initially allocated chunk 233b. 
When a thread, e.g., thread 236, fills its chunk 233a, thread 236 is allocated another chunk 233c. Threads continue to be allocated chunks 233 until no chunks 233 are available, at which time a garbage collection may be performed. It should be appreciated that although the requests for chunks 233 are synchronized, the synchronization generally does not occur as frequently as the allocation synchronization that was previously mentioned.
Allocating chunks 233 to threads 236, 238 often results in substantial fragmentation, as each chunk 233 must generally be sized to hold a large object. Hence, when a chunk is partially full, and a large object created by a thread does not fit in the partially full chunk, a new chunk will be allocated to the thread to accommodate the large object. The space left in the partially full chunk is then effectively wasted. In addition, the allocation of space in the chunks may be inefficient when threads which are slow allocating are in possession of virtually empty chunks, thereby reserving memory space which may never be needed.
Therefore, what is desired is a method and an apparatus for efficiently allocating memory in a multi-threaded system such as a multi-threaded virtual machine. Specifically, what is needed is a method and an apparatus for allowing threads to create new objects in a memory allocation area, e.g., an Eden, while minimizing memory space, minimizing allocation costs, and improving the efficiency of garbage collection.
The present invention relates to the efficient allocation of shared memory in a multi-threaded computer system. In accordance with one embodiment of the present invention, a computer-implemented method for allocating memory shared by multiple threads in a multi-threaded computing system includes partitioning the shared memory into a plurality of blocks, and grouping the multiple threads into at least a first group and a second group. A selected block is allocated to a selected thread which may attempt to allocate an object in the selected block. The allocation of the selected block to the selected thread is based at least partially upon whether the selected thread is a part of the first group or the second group. In one embodiment, grouping the multiple threads into the first group and the second group includes identifying a particular thread and determining whether the particular thread is a fast allocating thread. In such an embodiment, when the particular thread is fast allocating, the particular thread is grouped into the first group.
According to another aspect of the present invention, a computer-implemented method for allocating shared memory in a multi-threaded computing system which includes at least a first thread and a second thread involves partitioning the shared memory into a plurality of blocks, and assigning a first block that is accessible to both the first thread and the second thread for the creation of new objects. After the system is allowed to run, a determination is effectively made as to whether the first block has overflowed. If it is determined that the first block has overflowed, the method includes determining whether an attempt by the first thread to allocate the first object in the first block caused the first block to overflow. If such is the case, a second block is assigned to the first thread. Assignment of the second block to the first thread is arranged to cause the first thread to effectively relinquish the ability to allocate objects in the first block. In one embodiment, the second thread does not have the ability to allocate objects in the second block.
In another embodiment, the method also includes determining when one of the first block and the second block have overflowed, as well as assigning a third block the first thread when it is determined that the second block overflowed, or assigning the third block to the second thread when it is determined that the first block overflowed. In such an embodiment, when it is determined that the first block overflowed, a fourth block may replace the first block.
According to still another aspect of the present invention, a computer-implemented method for allocating memory in a multi-threaded computing system includes partitioning the memory into a plurality of blocks which includes a first block and a second block that is substantially larger than the first block. The first block is assigned to be accessible to a first thread which is arranged to attempt to allocate a first object in the first block, and the second block is assigned to be accessible to the second thread in order for the second thread to attempt to allocate a second object in the first block. In one embodiment, the first block has a size in the range of approximately 1 kiloByte to approximately 4 kiloBytes, and the second block has a size in the range of approximately 16 kiloBytes to approximately 32 kiloBytes.
The present invention will be more readily understood upon reading the following detailed descriptions and studying the various figures of the drawings.