The present invention relates generally to the management of memory in computer systems, and more particularly to a system and method facilitating code generated by an unmanaged compiler to participate in garbage collection.
Memory available for program execution is one of the most important resources in a computer system. Therefore, much time and energy has been directed to efficient utilization and management of memory. An important aspect of memory management is the manner in which memory is allocated to program(s), deallocated and then reclaimed for use by other program(s). A memory manager dynamically manages a xe2x80x9cheapxe2x80x9d of memory from which it allocates blocks to program(s). When program(s) need a block of memory to store data, the program(s) send a request to the memory manager for memory. The memory manager then allocates a block of memory in the heap to satisfy the request and sends a reference (e.g., a pointer) to the block of memory to the program(s). The program(s) can then access the block of memory through the reference.
Conventionally, many programming languages have placed the responsibility for dynamic allocation and deallocation of memory on the programmer. These programming language types are referred to as unmanaged or unsafe programming languages, because pointers can be employed anywhere in an object or routine. In C, C++ and Pascal programming languages, memory is allocated from the heap by a call procedure, which passes a pointer to the allocated memory back to the call procedure. A call to free the memory is then available to deallocate the memory (e.g., return it to the heap). However, if a program overwrites a pointer, the corresponding heap segment becomes inaccessible to the program and the memory manager. Inaccessible heap segments are known as xe2x80x9cmemory leaksxe2x80x9d.
In many conventional programming languages, heap allocations are required for data structures that survive the procedure that created them. If these data structures are passed to further procedures or functions, it may be difficult or impossible for the programmer or compiler to determine the point at which it is safe to deallocate them. Memory, such as data structures, that are no longer reachable, but have not been freed are called garbage.
An alternative to requiring explicit memory deallocation calls is to place the responsibility for finding unused memory on a component of the runtime environment called the garbage collector. The garbage collector (GC) component has the responsibility for periodically traversing data structure(s) in program(s) to find memory that is still being accessed (directly and/or indirectly) and reclaiming memory that is no longer being used. Additionally, many garbage collectors are xe2x80x9ccompactingxe2x80x9d, that is, they move memory block(s) that are currently in use together (e.g., contiguous), removing xe2x80x9cholesxe2x80x9d left by unused memory (e.g., deallocated). Such garbage collectors require the ability to find pointer(s) to GC memory (e.g., to determine which location(s) are currently in use) and also the ability to update pointer(s) so that memory item(s) on the GC heap can be moved in order to compact the GC heap.
Accordingly, for a GC to occur, pointer(s) to memory managed by the garbage collector need to be enumerated, including pointer(s) that are in machine register(s) as well as pointer(s) stored on the execution stack. One way of providing this capability is for a compiler that generates the machine instruction(s) to also output extra information for garbage collection. Such code is called xe2x80x9cmanaged codexe2x80x9d because it enables management of garbage collector memory by the runtime environment. Conventional compilers that don""t provide this extra information for garbage collection generate xe2x80x9cunmanaged codexe2x80x9d. Unmanaged code generated by conventional compilers normally must be: (1) highly constrained (e.g., be short and never call a method that could possibly cause a GC); and, (2) wrapped by a relatively inefficient routine that saves machine state on entry and restores the state on exit. This routine can then participate in the unwinding protocol on the unmanaged code""s behalf. This mechanism does not allow the unmanaged code to manipulate GC pointers since the wrapper method is not knowledgeable regarding manipulation(s) that can occur within the wrapped routine.
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. This summary is not an extensive overview of the invention. It is not intended to identify key/critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description that is presented later.
At the time of garbage collection, garbage collectors enumerate and/or update a set of pointer(s) that point at the memory heap managed by the garbage collector. The set of pointer(s) includes pointer(s) stored in machine register(s) and/or pointer(s) stored on the execution stack. Since machine register(s) can be saved (e.g., spilled), and restored repeatedly, determining location(s) of pointer(s) (e.g., stored in machine register(s) and/or on the execution stack) is not trivial.
The present invention relates to a system and method allowing code generated by an unmanaged compiler to participate in garbage collection and, thus, mitigate limitation(s) of the prior art. To simplify garbage collection, machine register(s) that could possibly hold GC pointers are spilled to memory before a GC happens. A machine state data structure is defined which holds: (1) for every register that can hold a GC pointer, the address where it was spilled; and, (2) the value(s) for other relevant machine register(s) (e.g., instruction pointer and/or stack pointer). Method(s) are then required to provide at least the following functionality. First, given a machine state data structure that represents a machine state that existed when last executing within the method, enumerate the address(es) of GC pointer(s) in use for the method at that point in time. The address of the GC pointer(s) is passed so that the GC can update the pointer(s) if required. Second, given a machine state data structure that represents a machine state that existed when last executing within the method, compute the machine state structure that represents the machine state that will exist at the time execution resumes within the method. (e.g., unwind out of the method).
The first capability allows a method to report its own GC pointer(s). The second capability allows the GC to find the caller of the method so that the call stack can be enumerated. Note that the second capability is necessary even if the method does not have GC pointer(s) of its own to report.
Methods that support these two capabilities are called xe2x80x9cmanaged methodsxe2x80x9d. Usually managed methods require support from the compiler that created the method. The compiler outputs additional information about the method that indicates when and where GC pointers are stored while the method is executing.
An aspect of the present invention provides for an unmanaged component (e.g., generated by an unmanaged compiler) to invoke a machine state capturing component that captures the machine state (e.g., callee saved machine register value(s) and stack pointer) into a machine state data structure and publishes the fact that the unmanaged routine desires to participate in garbage collection pointer enumeration and unwinding protocol.
Another aspect of the present invention provides for an unwind component to be invoked during garbage collection that, if the unmanaged component has published the fact that it desires to participate in garbage collection pointer enumeration, utilizes information stored in the machine state data structure (e.g., unmanaged component saved machine register value(s) and stack pointer) to facilitate participation in garbage collection by the unmanaged routine. As part of its participation in garbage collection, the unwind component can alter contents of the machine state data structure stored by the capture state routine, memory heap pointer(s) and/or information stored on the stack.
Yet another aspect of the present invention provides for the unmanaged component to invoke a machine state restoring component that restores the machine state (e.g., machine register value(s) and stack pointer) saved by the machine state capturing component (which may have been altered by garbage collection) and/or register(s) affected by garbage collection (e.g., memory heap pointer(s)), and publishes the fact that the unmanaged component no longer desires to participate in garbage collection pointer enumeration.