Managing available memory is critically important to the performance and reliability of a computer system. Specifically, data used by a computer program is typically stored in a computer system within a memory that has a limited address space. In many computer systems, data is stored in the form of "objects" that are allocated space in a portion of the memory referred to as an "object heap". Objects also often include "references" (also known as pointers) to other objects so that a computer program can access information in one object by following a reference from another object. Typically each computer program has its own object heap, so if multiple computer programs are active in a computer system, multiple object heaps may be maintained in the system.
Whenever new data is to be used by a computer program, a portion of the free memory is reserved for that data using a process known as "allocating" memory. Given that the amount of memory available in a computer system is limited, it is important to free up, or "deallocate", the memory reserved for data that is no longer being used by the computer system. Otherwise, as available memory is used up, the performance of the computer system typically decreases, or a system failure may occur.
A computer program known as a garbage collector is often used to free up unused memory that has been allocated by other computer programs in a computer system. Often, a garbage collector executes concurrently with other computer programs to periodically scan through the object heap(s) and deallocate any memory that is allocated to unused objects (a process also known as "collecting" objects). Different computer programs that operate concurrently in a computer system often include one or more "threads" that execute concurrently with one another. Moreover, when different computer programs use different object heaps, separate garbage collector computer programs, also referred to as collector threads, may be used to manage each object heap.
One specific type of garbage collector is a concurrent mark sweep collector, which sequences repeatedly through individual collection cycles, with each cycle sequentially operating in mark and sweep stages. In the mark stage, the collector scans through an object heap beginning at its "roots", and attempts to "mark" objects that are still are reachable from a root (i.e., that are referenced directly by a root or by a chain of objects reachable from a root). In the sweep stage, the collector scans through the objects and deallocates any memory reserved for objects that are unmarked as of completion of the mark stage.
Concurrent mark sweep collectors are often very desirable for collecting unused data as they often have minimal impact on the responsiveness of program threads. However, by allowing program threads to run concurrently with a collector, a problem arises due to the fact that the program threads could interfere with the work of the collector. This interference could confuse the collector and cause the collector to collect an object that is actually reachable. If a such an object is collected, unexpected behavior may occur, possibly resulting in incorrect behavior and/or in partial or total system failure.
The process of ensuring that data accessed by one computer program in a computer system is not unpredictably affected by the operation of another computer program is generally referred to as "synchronization". Synchronization is typically not a concern for "stop-the-world" garbage collectors, as these types of collectors halt execution of all active program threads during collection, which prevents other program threads from unpredictably modifying data during collection. However, halting all program threads, even for a short time, significantly degrades system performance and degrades the responsiveness of program threads. Thus, "stop-the-world" collectors are typically not as desirable as concurrent collectors, and may not be suitable for many applications.
One specific type of data that may introduce the aforementioned synchronization problem is interned data, which is typically used to speed comparisons between data and reduce storage requirements. Interned data is typically stored and maintained in a data structure such as an intern table, and is processed using a computer function known as an "intern operation". Intern operations may be used, for example, to simplify the determination of the equality of two data elements when equality is based upon the contents of such data elements.
For example, in the Java programming language from Sun Microsystems, intern operations are used to implement the equality semantics for character string literals, which typically consist of an ordered arrangement, or "array", of alphanumeric characters. For example, for the character string literal "grape", the literal is considered to include a string formed by the characters "g", "r", "a", "p" and "e". When called, intern operations in Java typically receive a reference to a string corresponding to a symbol. The intern operation then determines whether the referenced string is already stored in an intern table. If not, the intern operation inserts a table entry for the string into the intern table and returns a reference to the string. If the referenced string is already stored in the intern table, a reference to the stored string is returned without modification to the table. As a result, two strings that have been interned can be compared for equality by comparing the references returned by the intern operations, rather than having to compare the strings on a character-by-character basis.
An interned string typically may be collected, and its entry in the intern table deleted, whenever it is not reachable except for its entry in the intern table. However, collection of interned strings is often problematic due to synchronization concerns with the intern table. An intern table may service multiple computer programs, each of which may include one or more concurrently operating execution threads, or sequences of instructions, that have the ability to independently access the information in the intern table. Accesses to interned strings through an intern table make such strings reachable, and thus unsuitable for collection. As a result, there is a risk that a program thread operating concurrently with a collector will attempt to access an interned string, which was not marked during the mark stage of a collection cycle, during the time period from the beginning of the mark stage of the collection cycle to when the interned string is to be deallocated during the sweep stage. Absent adequate synchronization, the interned string would be collected, thereby introducing possibly unpredictable behavior in the computer system.
Another problem associated with collecting interned data is that an intern table can become large. Iterating through an entire intern table during collection may thus be inefficient and may adversely impact system performance. Moreover, if the intern table is spread out in memory, iterating through the table may result in "page faults" that can further decrease system performance.
Therefore, a significant need exists for an improved manner of efficiently collecting interned data from an intern data structure with a minimal impact on system performance.