The Java programming language has its origins in a project undertaken by Sun Microsystems to develop a robust programming environment that would meet the technical challenges of the consumer device software environment. The original consumer device projects were eventually abandoned but the Java programming language found itself being used on the World Wide Web to enable cross platform operation of programs downloaded from the internet. It is simple to use having similar features to C++ such as the basic object orientated technology but without some of the more complex features.
Typically, Java applications (source code) are compiled by the Java compiler into Java byte code (intermediary code or pseudo object code) which can be loaded and executed by a Java Virtual Machine (JVM) (see FIG. 1). The JVM provides an instruction set, memory management and program loading capability that is independent of the hardware platform on which it has been implemented. The Java application source code is compiled into architecture independent byte code and the byte code is interpreted by a JVM on the target platform. Java is designed to be portable and follows some defined portability standards, which intend the source code to be “write once, run anywhere”. The Java byte code may be further compiled into machine code (object code) for the target platform at which point the architectural independent nature of Java is lost.
The JVM is a software computing machine, effectively it simulates a hardware machine that processes Java byte code. The byte code is interpreted and processed by a JVM such as an Windows JVM running on a Intel personal computer platform. The JVM includes components for loading class files, interpreting the byte code, garbage collecting redundant objects, and for managing multiple processing threads. The JVM may also include a Just-In-Time compiler to transform some or all the byte code into native machine code.
Multithreading is a feature built into the Java language to allow users to improve interactive performance by allowing operations to be performed while continuing to process user actions. Multithreading is similar to multitasking, but whereas multitasking allows many applications to run on the same system in several processes, multithreading allows many routines (threads) in one application to potentially run in parallel within one process.
Garbage collection is the term used for describing how program objects are automatically discarded by the system after they have been loaded into memory and after they are no longer useful.
For further information on garbage collection see Chapter 1 of ‘Garbage Collection’ by H Jones & R Lins, Wiley. Chapter 4 deals with Mark & Sweep techniques.
Many current implementations of Java use the classic mark-sweep-compact method of garbage collection as delivered in the base SUN JVM. References to the objects that are being processed at any instant by the system are stored in the registers, one or more thread stacks and some global variables. The totality of objects that may be needed by the system can be found by tracing through the objects directly referenced in the registers, stacks, and global variables and then tracing through these “root” objects for further references. The objects in use by a system thereby form a graph and any extraneous objects are not part of this graph. Once all the objects in the graph are found, the remaining objects in the heaps may be discarded (garbage collected).
The traditional mark and sweep garbage collection method is described below in terms of pseudo code with respect to a single heap:                Stop all threads causing the active registers for each thread to be stored in its stack        Trace all stacks for object references—the local roots        Trace all global variables for object references—the global roots        Trace through root set for references until no new object references (the sum of the local and global roots is the root set).        Delete all objects in the single heap that are not referenced        
There are problems with this technique in a multi-threaded and long running environment. The first problem is that in order to garbage collect all the threads must be stopped in order to work out what objects are unreachable (there are no pointers to them in the global or local variables and no pointers to them in other reachable objects. Various authors have attempted to solve this problem. One approach is an on-the-fly collector which does not stop all threads, however it cannot compact the reachable objects leading to fragmentation. Another approach are the generational scavenging schemes, which reduce the size of the set of traced objects by concentrating effort on the most recently allocated objects; however, these schemes must stop all of the threads. In an ideal world we would like to achieve a collector which works independently on all threads and compacts the local heap of the threads to maximise the free space available.
Another solution attempts to achieve this in a language (ML) other than Java by taking advantage of immutable objects which can be placed in thread-local heaps. An immutable object is non-modifiable and when such an object become reachable globally a copy of the object can be made in the global heap. Clearly this technique is only applicable to languages defining immutable objects.
Another approach moves an object into the global heap on first use. The difficulty here is that in order to move the object, references from elsewhere to it must be updated; in an environment where objects are referenced by handles this is made easier although there are still cases where objects cannot be moved. Unfortunately handles bring their own problems and the IBM ports of the JVM have removed handles to improve performance and remove the need to subdivide the heap into handles and object spaces.