The discussion of any work, publications, sales, or activity anywhere in this submission, including in any documents submitted with this application, shall not be taken as an admission that any such work constitutes prior art. The discussion of any activity, work, or publication herein is not an admission that such activity, work, or publication existed or was known in any particular jurisdiction.
Increasingly, object-oriented programming is widely chosen by software developers over traditional data/procedure paradigms. Object-oriented programming's success is partly based on its attractive features such as encapsulation, dynamic binding, inheritance and polymorphism. These features facilitate code reuse and code sharing and reduce the dependencies between different software modules, allowing developers to rapidly and iteratively prototype software and produce more reliable and maintainable software products.
Among a variety of object-oriented languages, Java has become popular in recent years because it is widely used for Internet applications. Java was designed to deliver powerful features in addition to its object-oriented programming approach. Such features include security, distributed computing, and platform independence. However, similar to other object-oriented languages, Java suffers from poor performance and typically Java applications will execute more slowly that an application written in, for example, C; sometimes by a factor of 10 or more.
One important cause of this performance deficiency is the heavy use of dynamic memory allocations and de-allocations in object-orientated programming. A memory allocation is invoked when creating a new object or array; while a memory de-allocation is invoked when garbage collecting an object or array. Studies as described in [1] and [2] indicate that a C++ program performs an order of magnitude more memory allocations than a comparable program written in C. For Java programs, the situation is even worse. For example, a simple Java Applet can generate about 600K memory allocations during execution of one game [6].
To improve the overall performance of object-oriented program execution, some investigations have been conducted to develop faster and more efficient memory allocators/de-allocators using both software and hardware. Generally software approaches provide a better utilization of memory, but suffer from slower speed as the memory management process must executes in parallel with application processes. Hardware approaches can yield better speed, but suffer from memory fragmentation as hardware generally cannot be as intelligent as software. Chang [4] and Cam [5] have discussed hardware methods that attempt to improve the performance of object-oriented memory management.
A method proposed by Chang is a modified buddy system [4]. It uses an or-gate tree combined with an and-gate tree with a structure roughly as shown in FIG. 1 for handling memory allocation and de-allocation requests in hardware. The free/used information of each unit of memory is recorded in a bit-vector. The or-gate tree is used to locate a free block and the and-gate tree is used to generate the address of the free block. It incorporates a “bit-flipper” to mark the bit-vector for used units and returns the unused portions of the block to available storage. This approach is simple and effective, but still suffers from a certain amount of fragmentation, as some free units may never be detected in the or-gate tree for some reallocation requests.
Cam utilizes the basic idea of Chang's suggestion and proposes another structure to handle the problem as shown in FIG. 2. This structure provides less fragmentation than Chang's method, but it requires much more logic gates to implement.
Both methods can do memory allocation and de-allocation requests in a single-cycle, but they can only detect free blocks with sizes in the power of 2. In addition, the trees will become too complex to implement if the total number of memory units is large. For example, if the basic unit for allocation is 16 bytes and the total memory is 128 MB, the size of the bit-vector is 8M bits. To implement such a system using Chang's design requires a tree with (28M)/2 nodes. If Cam's design is applied, even more nodes are needed. It is impractical to implement such a design in a chip when the number of nodes is so large.
To overcome this problem, larger units may be used to reduce the total number of blocks, but this will lead to greater internal fragmentation. Another approach is to partition the memory into many regions so that the hardware tree is used for managing only one region and the operating system is responsible to switch the active region for the hardware to work on from time to time. This method ruins the performance of the hardware approaches, as much software overhead is required in augmenting the hardware.
Other References
Various strategies have been discussed for memory allocations, among them those discussed in the below indicated patents and other publications, some of which also provide general background information related to the present discussion.    [1] David Detlefs, Al Dosser, and Bejamin Zorn. “Memory allocation costs in large C and C++ programs.”. Software—Practice and Experience, pp. 527–542, June 1994.    [2] Brad Calder, dirk Grunwald, and Benjamin Zorn, Quantifying Behavioral Differences Between C and C++ Programs, Technical Report CU-CS-698–95, Department of Computer Science, University of Colorado, Boulder, Colo., January 1995.    [3] M. Chang, Woo Hyong Lee, and Y. Hasan. “Measuring dynamic memory invocations in object-oriented programs”. IEEE International Computing and Communications Conference Performance 1999, pp. 268–274. IEEE Computer Society Press, Feb 1999.    [4] J. M. Chang and E. F. Gehringer. “A high performance memory allocator for object-oriented systems”. IEEE Transactions on Computers, volume 45, issue 3, pp. 357–366. IEEE Computer Society Press, March 1996.    [5] H. Cam, M. Abd-El-Barr, and S. M. Sait. “A high-performance hardware-efficient memory allocation technique and design”. International Conference on Computer Design, 1999. (ICCD '99), pp. 274–276. IEEE Computer Society Press, October 1999.    [6] Richard C. L. Li, Anthony S. Fong, H. W. Chun, and C. H. Tam. “Dynamic Memory Allocation Behavior in Java Programs”. Proceedings of the ISCA 16th International Conference in Computers and Their Applications, 2001. (CATA-2001), pp 362–365. The International Society for Computers and Their Applications—ISCA.
U.S. Patent Documents    [7] 5,930,829 July 1999 Frank S. Little    [8] 6,219,772 April 2001 Ashok Kumar Gadangi    [9] 6,295,594 September 2001 Stephan G. Meier