1. Field of the Invention
The present invention relates generally to computer software. More particularly, the present invention relates to improving the placement of data in memory to increase the speed with which the data may be accessed.
2. Discussion of Related Art
Cache memory is often used in conjunction with a main memory to provide an increased memory speed. Typically, portions of data are copied from the main memory so that the cache contains a copy of these portions of main memory. When the CPU attempts to read a word, a check is made to determine if the word is in the cache. If the word is in the cache, the word is read from the cache. Otherwise, a number of words are read from the main memory to be stored in the cache and the word is provided to the CPU.
When a word is not in the cache, it must be read from main memory, which increases memory access time. In order to increase the probability that requested data is present in the cache, caches typically use heuristics to guess which data will be accessed and copy this data into the cache. Thus, computer memory systems often rely on caches to keep the most frequently accessed items close to processors.
Since multiple words are simultaneously stored in the cache, the placement of data in memory can affect the efficiency of caches, thereby affecting the overall speed at which a computer system operates. It is therefore important to efficiently lay out data (e.g., objects) in memory to maximize speed. Previously proposed techniques have large overheads or cannot be used in a dynamic environment such as that executing Java bytecodes.
In order to eliminate objects that are no longer referenced from memory, garbage collection is often performed. Two commonly used methods of performing garbage collection are the xe2x80x9cmark and sweepxe2x80x9d method and the xe2x80x9ccopying garbage collectionxe2x80x9d method. As will be described in further detail below, the relocation of data in memory may be performed to some degree during the garbage collection process.
FIG. 1A is an exemplary block diagram illustrating the placement of objects in memory during a conventional mark and sweep garbage collection process. As shown in FIG. 1A, objects 102, 104, 106, and 108 are illustrated. More particularly, the object 102 references both the objects 104 and 106, and is referenced by a thread of execution. Mark and sweep garbage collection is typically performed in two passes. During the first pass, each object that is not referenced by any objects is marked. For instance, object 108 is not referenced by any of the objects 102, 104, or 106 and is therefore marked for deletion. During the second pass, the memory for each object that is not marked is reclaimed.
FIG. 1B is a block diagram illustrating the memory of FIG. 1A upon completion of a conventional mark and sweep garbage collection process. As shown, the object 108, marked in FIG. 1A, is deleted and therefore not shown in FIG. 1B. The objects 102, 104, and 106 remain after completion of the garbage collection process. It is important to note that objects are not usually relocated during mark and sweep garbage collection.
Another method of garbage collection, copying garbage collection, is also commonly used. FIG. 2A is an exemplary block diagram illustrating objects in memory during a conventional copying garbage collection process. As shown, within memory 200 are multiple objects. For instance, a first object 202, xe2x80x9cAxe2x80x9d, a second object 204, xe2x80x9cBxe2x80x9d, and a third object 206, xe2x80x9cDxe2x80x9d, are stored in memory and all are reachable from a root. In addition, a fourth object 208, xe2x80x9cCxe2x80x9d, is stored in the memory 200 but is not referenced by any other objects in memory.
FIG. 2B is a block diagram illustrating the memory of FIG. 2A upon completion of a conventional copying garbage collection process. During copying garbage collection, all objects that are referenced by one or more objects are copied while those objects that are not referenced by any other objects are not copied. Thus, all objects that are not copied are garbage. For instance, as shown in FIG. 2B, the fourth object 208, xe2x80x9cCxe2x80x9d, is not copied. Once copied, the memory for the original objects shown in FIG. 2A may then be reclaimed.
During copying garbage collection, the objects may be placed in various orders during the copying process. FIG. 3A is a block diagram illustrating an exemplary configuration of objects in memory. As shown, a memory 300 stores a first object 302, xe2x80x9cAxe2x80x9d, a second object 304, xe2x80x9cBxe2x80x9d, a third object 306, xe2x80x9cDxe2x80x9d. The first object 302 references both the second object 304 and the third object 306. A fourth object 308, xe2x80x9cCxe2x80x9d, is referenced by none of the objects.
Since the first object 302 references both the second and third objects 304 and 306, the objects may be placed in two different orders. FIG. 3B is a block diagram illustrating one possible configuration of the objects of FIG. 3A upon completion of copying garbage collection. As shown, the second object 304 may be placed adjacent to the first object 302 while the third object 306 may be placed adjacent to the second object 304.
FIG. 3C is a block diagram illustrating another possible configuration of the objects of FIG. 3A upon completion of copying garbage collection. As shown, rather than placing the second object 304 adjacent to the first object 302, the third object 306 is placed adjacent to the first object 302. In the simplified examples illustrated in FIG. 3B and FIG. 3C, the objects may be placed in two different orders. It would be beneficial if a mechanism were designed to enable the objects to be ordered while maximizing the speed of access of the objects in memory.
In object-oriented programming, code and data are merged into objects. Each object is defined via its class, which determines the properties of an object. In other words, objects are individual instances of a class. Moreover, each object may include various fields as well as methods.
As disclosed in the article entitled xe2x80x9cUsing Generational Garbage Collection To Implement Cache-Conscious Data Placementxe2x80x9d by Trishul M. Chilimbi and James R. Larus, which appeared in International Symposium on Memory Management (ISMM ""98), October, 1998, objects accessed closely together in time may be stored in memory so that they will be fetched in the same cache line. The process disclosed in Chilimbi will be briefly described with reference to FIGS. 4A, 4B, and 4C. FIG. 4A is a block diagram illustrating an exemplary set of objects stored in memory and associated fields. As shown, a first object 400, xe2x80x9cAxe2x80x9d, includes a first field 402, xe2x80x9cxxe2x80x9d, and a second field 404, xe2x80x9cyxe2x80x9d. Similarly, a second object 406, xe2x80x9cBxe2x80x9d, includes a first field 408, xe2x80x9cwxe2x80x9d, and a second field 410, xe2x80x9czxe2x80x9d. A third object 412, xe2x80x9cCxe2x80x9d, includes a field 414, xe2x80x9cexe2x80x9d, and a fourth object 416, xe2x80x9cDxe2x80x9d, includes a field 418, xe2x80x9cfxe2x80x9d.
FIG. 4B is a block diagram illustrating an exemplary log of memory accesses that may be produced during the execution of a computer application accessing the objects and fields shown in FIG. 4A. In Chilimbi, a computer application is instrumented such that memory references (e.g., load and store commands) are logged. When the instrumented computer application is executed, a log of memory accesses 420 is produced. More specifically, an object 422 is logged for each memory access. For instance, when a field (e.g., the first field 402) of the first object 400, xe2x80x9cAxe2x80x9d, is fetched from memory, this memory access is logged as shown in entry 426.
The memory access log is then used to create a temporal reference graph modeling the reference locality between objects. FIG. 4C is an exemplary temporal reference graph illustrating the accesses of objects in memory and the temporal relationships of these memory accesses that may be produced from the log of memory accesses shown in FIG. 4B. As shown in temporal reference graph 428, accesses of the objects are placed in the graph according to the temporal relationships of the memory accesses shown in the log of memory accesses 420. The temporal reference graph 428 may then be used to achieve the proper placement of objects so that those objects that are accessed closely in time are placed in close proximity to one another. More particularly, the temporal reference graph 428 is used to guide the order in which the objects are copied during copying garbage collection. However, it is important to note that Chilimbi ignores the specific field accessed within the corresponding memory accesses. Moreover, since the creation of the log requires substantial overhead for every memory reference, the creation of this log requires time and memory resources.
Although Chilimbi discloses reordering objects in memory, the specific fields accessed within these objects is ignored. Although Chilimbi ignores the accesses of the fields, the reordering of fields within a single object has been contemplated and will be described with reference to FIG. 5A, FIG. 5B, and FIG. 5C.
FIG. 5A is a block diagram illustrating an exemplary set of objects stored in memory and associated fields. As shown, a first object 500, xe2x80x9cAxe2x80x9d, has a first field 502, xe2x80x9cyxe2x80x9d, a second field 504, xe2x80x9cxxe2x80x9d, and a third field 506, xe2x80x9cvxe2x80x9d. In addition, a second object 508, xe2x80x9cBxe2x80x9d, has a first field 510, xe2x80x9czxe2x80x9d, and a second field 512, xe2x80x9cwxe2x80x9d.
FIG. 5B is a block diagram illustrating an exemplary log of memory accesses that may be produced during the execution of a computer application accessing the objects and fields shown in FIG. 5A. A log of memory accesses 514 in which each object 516 and associated field 518 accessed in memory are identified and logged. For instance, as shown, when the second field 504, xe2x80x9cxxe2x80x9d, is accessed in the first object 500, xe2x80x9cAxe2x80x9d, the object accessed 520, xe2x80x9cAxe2x80x9d, and the field accessed 522, xe2x80x9cxxe2x80x9d, are logged.
Once the log of memory accesses is created, the log is then used to determine the temporal relationship between the logged memory accesses. A temporal reference graph modeling the reference locality between objects is then created. FIG. 5C is an exemplary temporal reference graph 524 illustrating the temporal relationship between the field references shown in the log of FIG. 5B. Rather than illustrating the relationship between the objects accessed, the graph 524 illustrates the relationship between the fields accessed. As shown, the first and second fields 502 and 504, x and y, of the first object 500, A are graphed. The temporal reference graph 524 is then used to reorder the fields x and y within the object A. As shown, the temporal reference graph 524 indicates that the order of the fields x and y with respect to one another is irrelevant. However, the storage of the third field 506, v, should not interfere with the storage of x and y so that x and y are in close proximity to one another. In this manner, fields within a particular object may be reordered. Although fields have been reordered within a single object, field accesses have not been analyzed to reorder the objects those fields reference.
Rather than instrumenting each memory reference, it is possible to instrument the paths of control flow encountered by an executing program through path profiling. FIG. 6A is an exemplary block diagram illustrating all possible paths during execution of a computer application. In this example, each block represents one or more computer instructions. Block 600 is executed prior to conditional statement 602. For instance, the conditional statement 602 may be an if-then-else statement. There are two branches that may be executed depending upon the result of the if-then-else statement. The first branch includes blocks 604 and 606. The second branch includes blocks 608 and 610.
When path profiling is performed, the paths of control flow are instrumented. FIG. 6B is a diagram illustrating an exemplary path profile illustrating the possible paths associated with the computer application of FIG. 6A. As shown, rather than instrumenting each load and store command as performed in Chilimbi, code is inserted at decision points as shown at block 612. Since the paths of control flow are instrumented, path profiling can determine how often each branch (i.e., path) is taken. Thus, path profiling is advantageous since it requires less overhead than instrumenting each load and store command within each path.
In view of the above, it would be desirable if objects in memory could be reordered through a process such as copying garbage collection to maximize the speed with which the fields of the objects may be accessed. Moreover, it would be beneficial if the frequency and proximity of accesses of the fields of the objects could be analyzed with reduced overhead using a process such as path profiling to determine the order in which the objects are to be copied.
The present invention generates a mechanism for reordering objects in memory in accordance with information obtained in relation to accesses of fields of the objects from memory. For instance, this may be accomplished through modifying the order of traversal of fields during copying garbage collection. In this manner, objects may be ordered in memory to minimize the speed with which the objects are later retrieved from memory.
According to one aspect of the present invention, a mechanism for rearranging a plurality of objects in memory is created. A frequency of accesses in memory of one or more fields associated with the plurality of objects with respect to one another during execution of a computer application are determined. A mechanism for rearranging the plurality of objects in the memory in accordance with the determined frequency of accesses in memory of the one or more fields associated with the plurality of objects with respect to one another is then generated.
According to one aspect of the invention, the mechanism that is generated includes garbage collection code for rearranging the plurality of objects during garbage collection. Garbage collection may be implemented in a variety of ways. For instance, copying garbage collection or some variation of copying garbage collection may be performed.
According to yet another aspect of the invention, a mechanism for reordering objects in memory may be created from information that indicates frequency and proximity of field references across all instances of a single class. According to one embodiment, a mechanism for modifying the order in which fields of objects are visited during garbage collection is created. First, field reference information is obtained for all instances of a class, where the field reference information indicates frequency and proximity of references of objects referenced to by fields of the class with respect to one another. A class field order is then determined from the associated field reference information, where the class field order identifies an order in which fields of the class are to be traversed during garbage collection. The process may be repeated for multiple classes such that the class field order is associated with each corresponding class.
According to another aspect of the invention, garbage collection is performed using the reordering mechanism. According to one embodiment, an object is identified and a class associated with the object is ascertained. A class field order associated with the class of the object is determined, where the class field order identifies an order in which fields of the class of the object are to be traversed during garbage collection. The fields of the object are then visited during garbage collection (e.g., copying garbage collection) in accordance with the class field order associated with the class of the object. For instance, for each field visited, each object pointed to directly or indirectly may be copied.
The present invention enables objects to be reordered in accordance with information that indicates proximity and frequency of references of fields of the objects in memory. In this manner, those objects referenced by these fields may be reordered in memory during garbage collection. Thus, objects accessed closely in time to one another may be stored in close proximity in memory. This is particularly advantageous in those systems implementing a cache, since this increases the probability that the objects will be retrieved and stored in the same cache line. Accordingly, the time to retrieve objects accessed in close proximity to one another may be dramatically decreased.