The Java programming language has its origins in a project undertaken by Sun Microsystems to develop a robust programming environment that would meet the technical challenges of the consumer device software environment. The original consumer device projects were eventually abandoned but the Java programming language found itself being used on the World Wide Web to enable cross platform operation of programs downloaded from the internet. It is simple to use having similar features to C++ such as the basic object orientated technology but without some of the more complex features.
Typically, Java applications (source code) are compiled by the Javac compiler into Java byte code (intermediary code or pseudo object code) which can be loaded and executed by a Java Virtual Machine (JVM) (see FIG. 1). The JVM provides an instruction set, memory management and program loading capability that is independent of the hardware platform on which it has been implemented. The Java application source code is compiled into architecture independent byte code and the byte code is interpreted by a JVM on the target platform. Java is designed to be portable and follows some defined portability standards, which intend the source code to be “write once, run anywhere”. The Java byte code may be further compiled into machine code (object code) for the target platform at which point the architectural independent nature of Java is lost.
The JVM is a software computing machine, effectively it simulates a hardware machine that processes Java byte code. The byte code is interpreted and processed by a JVM such as an Windows JVM running on a Intel personal computer platform. The JVM includes components for loading class files, interpreting the byte code, garbage collecting redundant objects, and for managing multiple processing threads. The JVM may also include a Just-In-Time compiler to transform some or all the byte code into native machine code.
Garbage collection is the term used for describing how program objects are discarded by the system after they have been loaded into memory and after they are no longer useful. Memory space in object oriented environments is at a premium due to the memory intensive nature of object orientated programs. For further information on garbage collection see Chapter 1 of ‘Garbage Collection’ by H Jones & R Lins, Wiley. Chapter 4 deals with Mark & Sweep techniques.
Many current implementations of Java use the classic mark-sweep-compact method of garbage collection as delivered in the base SUN JVM. References to the objects that are being processed at any instant by the system are stored in one or more thread stacks and some global variables. The totality of objects that are needed by the system can be found by tracing through the objects referenced in the stacks looking for references to new objects, tracing the global variables and then tracing through these “root” objects for further references. The objects in use by a system thereby form a graph and any extraneous objects are not part of this graph. Once all the objects in the graph are found, the remaining objects in the heaps may be discarded (garbage collected).
The traditional mark and sweep garbage collection method is described below in terms of pseudo code with respect to a single heap:                Stop all threads        Trace all stacks for object references—the local roots        Trace all classes for object references—the global roots        Trace through root set for references until no new object references (the sum of the local and global roots is the root set).        Delete all objects in the single heap that are not referenced.        
One of the problems in garbage collecting is tracing a stack for object references when the stack is a mixture of variables including pointers to objects, floating pointer numbers and integer numbers. An accurate scan determines the object pointers exactly whereas a conservative scan determines which words are not object pointers and which maybe. The conservative scan is not exact but it uses less resources than those needed for an accurate scan.
A conservative scan (see FIG. 4) retrieves the stack pointer (step 4.1) and then retrieves the word in the stack indicated by the pointer (step 4.2). A first test is applied, if the word is an object pointer is it pointing in the correct part of the memory, typically this will be between certain limits (step 4.3). If the tested word points outside the limits then it is not an object pointer (4.4). A further test is applied (step 4.5) to check whether the word points to the normal object boundary in the heap? Typically the boundaries in the heap will be multiple number of bytes such as 8-although there may be several sizes of object grouping say small 8 bytes, medium 64 bytes and large 4096 byte boundary. The word if it is a pointer will point to one of these boundaries and the word is added to the root set step 4.6). If the word does not so point then it is not an object pointer (4.5). If the scan is finished (step 4.7) then the conservative root set has been acquired (4.9). If the scan is not so finished then the stack pointer is incremented and the cycled started again by acquiring the stack word (step 4.2). The root set acquired in this conservative scan is therefore a larger set of words than exist actual object pointers. Furthermore when stack space is created previously existing stack pointers are not immediately overwritten and infiltrate the conservatively scanned root set.
Garbage collection is performed on all the objects in the heap minus the root set and therefore not all the objects that should be are collected. Moreover, since some of the words in the root set are not object pointers, it is fatal to treat them as object pointers for the purposes of updating them when an object is moved. Therefore compaction of objects in a conservatively scanned root set is not desirable.
Accurate scanning has been achieved on the J stack. Techniques exist to find object references in the Java stack which rely on abstract interpretation of the Java code to discover the current stack map at a given set of designated ‘safe points’. When execution reaches such a safe point we can do garbage collection in the knowledge that we have a complete map of the where the references are and can update them. A map is stored for each safe point. The map identifies each word in the stack at that point in the process. For a large number of safe points there will be a large number of maps and a high memory usage. A ‘safe point’ is a point where garbage collection may safely be carried out when there is no object reference both in the stack and in the a register. If this were the case then updating the object reference in the stack during a compaction would not leave a discrepancy between the register pointer and the stack pointer and cause a serious error, possibly a system crash. The Java stack in the JVM holds the variables created and used by the Java application.
The C stack holds the variables created and used by the virtual machine when it interprets the Java application. There is a problem with accurate scanning for garbage collection in the C stack as some object pointers are processed in registers but not placed in the C stack and hence should be in the root set. Most JVM's are compiled from C by a compiler which leaves object pointers in registers as long as possible to improve speed. The disadvantage of this speed optimisation is that many objects are hidden in registers (up to 32) not on the stack and cannot be scanned.
An advantage of the present invention is that the new reference structure forces a C compiler to update the C stack—empty the registers at safe points.