1. Field of the Invention
The present invention relates to computer systems that support references to objects defined within an object-oriented programming system. More specifically, the present invention relates to an efficient design for a store queue within an object-addressed memory hierarchy in a computer system.
2. Related Art
Out-of-order processors commonly include a store queue to facilitate maintaining program order and memory consistency. During program execution, entries for store instructions are inserted into the queue in-order as the store instructions are fetched. A subsequent load instruction compares its address against all older outstanding stores in the store queue to determine whether the load is dependent on a stored value which has not been written out to the memory hierarchy. Unfortunately, the memory address associated with a store may not be available until some time after the store is fetched (for example, if the address is dependent on other pending instructions). Consequently, subsequent loads may be delayed awaiting calculation of the store's address, since it is unknown whether the loads will depend on the store.
Conventional memory hierarchies form memory addresses by summing two operands, a base and offset. For accesses to objects, the offset is commonly a constant but the pointer to the object is a variable which is contained in a register. Even though the offsets may be known to be distinct, the store-load dependence cannot be resolved without the base pointers, because every bit in the store's address depends on the value of the pointer. For example, referring to FIG. 1, suppose a processor issues a store instruction (step 102) and then issues a load instruction (step 104). Also suppose the load generates an address A1, which is calculated by adding a pointer (from REG1) to an offset IMM1 (an immediate operand with a constant value) (step 106). At this point, if the store address does not exist in the store queue, the load must wait for the store address (step 108). When the store finally generates the store address A2 (step 110), which is calculated by adding the address in REG2 to IMM2, the system compares the load address A1 to A2 and all other store addresses for older outstanding store instructions, which are contained in the store queue (step 112). If A1 does not match any of these store addresses, the system retrieves the data item for the load from the cache (step 114). Otherwise, the system retrieves the data item for the load from the matching store queue entry as soon as the stored data item is available (step 116). Note that making the load instruction wait for the address of the preceding store instruction to be generated can significantly reduce instruction-level parallelism and processor throughput.
Hence, what is needed is a mechanism that facilitates early determination of store-load dependencies when accessing object fields.