As object-based systems become widespread, large object stores are becoming more common. As processor speeds rise, these object stores need to provide fast access to large collections of persistent objects so they do not become a bottleneck in system throughput. To date, most object stores have been implemented using stock hardware, by emulation in software. While acceptable as an initial solution, it is possible that large performance gains are to be had by using architectures more suited to the task at hand.
FIG. 1 illustrates an example of a conventional memory hierarchy for a multiprocessor. The multiprocessor includes two processors (10, 12) each connected to a Translation Look-aside Buffer (TLB) (14, 16). Each TLB (14, 16) is associated with an L1 Cache (18, 20). The L1 caches (18, 20) are subsequently connected to a single L2 Cache (22) which is connected to a memory (24).
The TLB (14, 16) holds the physical addresses associated with a subset of the virtual address space for a small number of recently-used translations. The translations map virtual addresses to physical addresses. The translations may be computed (and entered into the TLB) either in software or in hardware. The L1 Cache (18, 20) is a form of fast memory holding recently accessed data, designed to speed up subsequent access to the same data. The L1 Cache (18, 20), specifically, is located on or close to the microchip containing the processor (10, 12). The L2 Cache (22) is similar to the L1 Cache (18, 20) except that it contains data that was not as recently accessed as the data in the L1 Cache (18, 20). Additionally, the L2 Cache (22) typically has a larger memory capacity and a slower access time. The memory (24) is typically random access memory (RAM).
When a load request is generated on the conventional architecture as shown in FIG. 1, a virtual address is sent from the processor (10, 12) to the corresponding TLB (14, 16), i.e., Processor A (10) sends the virtual address to Translation Look-aside Buffer A (14) and Processor B (12) sends the virtual address to Translation Look-aside Buffer B (16). The TLB (14, 16) converts the virtual address into a physical address that is subsequently sent to the L1 Cache (18, 20). Associated with the L1 Cache is an L1 Cache tag array. The L1 Cache tag array is an index of data stored in the L1 Cache (18, 20). If the physical address, sent from the TLB (14, 16) to the L1 Cache (18, 20), is present in the L1 Cache tag array, then the datum corresponding to the physical address is retrieved and sent to the requesting processor (10, 12). If the physical address is not present in the L1 Cache tag array, then the L1 Cache (18, 20) forwards the physical address to the L2 Cache (22). Similarly, the L2 Cache (22), is associated with an L2 Cache tag array.
If the physical address is found in the L2 Cache tag array, then a cache line associated with the physical address is retrieved and sent to the L1 Cache (18, 20). The cache line is the unit of transfer between the L2 Cache (22) and the L1 Cache (18, 20). Once the L1 Cache (18, 20) receives the cache line, the L1 Cache retrieves and forwards the requested datum within the cache line to the requesting processor (10, 12).
If the physical address is not found in the L2 Cache tag array, then the L2 Cache (22) forwards the physical address to memory (24). Once the physical address is found in memory (24), the entire cache line on which the requested datum is located is retrieved and sent to the L2 Cache (22). The L2 Cache (22) subsequently forwards the entire cache line to the appropriate L1 Cache (18, 20). Upon receipt of the entire cache line, the L1 Cache (18, 20) forwards the requested datum within the cache line to the appropriate processor (10, 12).
There are three existing approaches to implementing an object store. Two are software-based, and map the object store onto a conventional memory hierarchy, such as that described above. The third is hardware-based.
In the first approach, a location-independent object ID (OID) is used to index a data structure known as an object table. The object table maps the OID to the virtual address of the start of the object. In this scheme, two memory references are required to access an object: one to index the object table, and one to index the object given its base address from the object table. Each of these accesses proceeds in the manner described above (from the processor through the TLB to the L1 cache, and thence to the L2 cache and the memory as necessary).
In the second approach, a reference to an object is a direct pointer to the start of the object. This eliminates the extra access required by the object table in the first approach, but means that an object cannot be relocated within the address space (e.g., for compaction or clustering) without all references to that object being updated. In contrast, when using an object table, only the address in the object table needs to change when an object is relocated.
A third approach is to build a hardware object cache, see Ifor W. Williams and Mario I. Wolczko, An object-based memory architecture, in Alan Dearle, Gail M. Shaw, and Stanley B. Zdonik, editors, Implementing Persistent Object Bases: Principles and Practice (Proceedings of the Fourth International Workshop on Persistent Object Systems), pages 114-130, Martha's Vineyard, Mass., September 1990. This does not use the conventional memory hierarchy, but instead the processor and caches are modified to directly use object addresses. An object address includes an OID and an offset. In this scheme, the memory system can only store objects; there is no provision for non-object data. The OID and a portion of the offset are used by the tag array to locate a cache line that contains the requested word. The low order bits of the offset are then used to obtain the requested word within the cache line. The object cache is typically implemented in hardware, with software used to manage filling the cache and evictions from the cache. Further, if there is a cache miss, i.e., a load request that can not be satisfied by the cache, a software translator (not shown) converts the object address into a physical address prior to sending the physical address to the memory.