1. Field of the Present Invention
The present invention generally relates to the field of computer architecture and more particularly to a method and circuit for improving the efficiency of retiring store operations in microprocessor based computers.
2. History of Related Art
In typical modern microprocessor designs, cache-able store instructions are executed and retired to cache memory in program order. Because the number of load operations exceeds the number of stores by a significant margin in typical codes sequences and because many load operations may be speculatively executed to take advantage of processor parallelism, cache access arbitration schemes commonly assign relatively low priority to store operations. This prioritization hierarchy can potentially result in a backlog of executed store operations awaiting an opportunity to access the cache. The constraint of in-order execution and retirement is accommodated by placing completed store instructions in a completed store queue where they await resolution of conflicts from higher priority cache access requests. Higher priority cache accesses may occur in the form of snoop requests, cache status bits updates, and other cache accesses depending upon the environment. As a result, a large number of store instructions may become stockpiled in the store queue, especially in processor intensive applications such as multi-processor systems, thereby making it imperative to take maximum advantage of each opportunity to retire store operations to cache.
Conventional microprocessor architectures, unfortunately, do not typically handle the retiring of store operations in optimal fashion. Referring to FIG. 4 of the drawings, a timing representation of a store queue of a conventional microprocessor architecture is presented. For each cycle, the state of selected locations of the microprocessor are detailed. The xe2x80x9cBSTxe2x80x9d represents the location within the store queue designed to hold the oldest pending store operation. In a typical microprocessor, the BST contents are transferred to a latch if a resource conflict is encountered during an attempt to access the cache from the store queue. In FIG. 4, a resource conflict denoted by reference number 402 is detected in cycle 0. In response to the resource conflict, the microprocessor transfers the BST contents (identified as op0) to the latch and shifts the next oldest pending operation (op1) to the BST. Thus, in cycle 1, op1 resides in BST as indicated by reference numeral 408 while op0 is found in the latch as indicated by reference numeral 406. Because op0 is no longer present within the store queue, a select signal SEL is set to indicate that the next pending store operation retired must be selected from the latch. In the example of FIG. 4, no resource conflict exists during cycle 1. Accordingly, the cache is accessed from the latch with op0 as indicated by reference number 412. The result of the cache access (i.e., hit/shared hit/miss, etc.) is not known until the following cycle 2. When the cache access is returned as a hit indicated by reference numeral 414, the select signal may be returned to 0 in the following cycle so that subsequently selected store operations are retired from the cache. Unfortunately, this architecture insures that no cache access may be attempted during cycle 2, despite the absence of a resource conflict, because the unknown result of the cache access prohibits updating the select signal until the following cycle. Thus, an opportunity to retire a pending store operation in cycle 2 has gone unfulfilled. Therefore, it would be desirable to provide an architecture in which the retiring of pending operations is handled in a more efficient manner without incurring any performance degradation and without significantly increasing the cost or complexity of the circuit.
The problems identified above are in large part addressed by a method and corresponding circuit for retiring executed operations to cache in an efficient manner by maintaining a store machine preferred state when a resource conflict preventing the store machine from accessing the cache is detected. This permits the store machine of the present invention to retire an operation in a cycle immediately following resolution of the resource conflict.
Broadly speaking, the present invention contemplates a method of retiring operations to a cache. Initially, a first operation is queued in a stack such as the store queue of a retire unit. The first operation is then copied, in a first transfer, to a latch referred to as the miss latch in response to a resource conflict that prevents the first operation from accessing the cache. The first operation is maintained in the stack for the duration of the resource conflict. When the resource conflict is resolved, the cache is accessed, in a first cache access, with the first operation from the stack. Preferably, the first operation is removed from the stack when the resource conflict is resolved and the first cache access is initiated. In the preferred embodiment, the first operation is maintained in the miss latch until the first cache access results in a cache hit. One embodiment of the invention further includes accessing the cache, in a first miss access, with the first operation from the miss latch in response to a cache miss that resulted from the first cache access. In a presently preferred embodiment, a second access is executed to access the cache with a second operation queued in the stack in response to a cache hit resulting from the first cache access. The first and second cache accesses preferably occur in consecutive cycles. Typically, the first and second operations are store operations that are queued in the stack in program order. In one embodiment the first operation is removed from the stack upon resolving of the resource conflict.
The present invention still further contemplates a system for retiring operations to a cache memory. The system includes a stack that is configured to save a first operation destined for the cache memory. A miss latch is coupled to the stack and configured to receive a first operation from the stack. A multiplexer of the system includes a first input connected to the stack, a second input coupled to the miss latch, an output connected to the cache memory, and a select input. A control circuit is coupled to the select input of the multiplexer. The control circuit is configured to select the first input of the mux and initiate copying, in a first transfer, of the first operation from the stack to the miss latch while maintaining the first operation in the stack. The first transfer occurs in response to a resource conflict preventing the stack from accessing the cache.
The control circuit preferably continues to select the first input of the mux for the duration of the resource conflict. In this manner, the stack acts as the source of a first access of the cache following a resolution of the resource conflict. In one embodiment, the system is further configured to access the cache, in a first cache access, with the first operation from the stack, in response to detecting a resolution of the resource conflict. The system preferably maintains the first operation in the miss latch until the first cache access results in a cache hit. In one embodiment, the control circuit selects the second input of the mux if the first cache access results in a cache miss. The system preferably accesses the cache, in a second cache access, with a second operation from the stack, in response to a cache hit resulting from the first cache access.
The present invention further contemplates a computer system including a processor, a cache memory, and a system memory. The processor is coupled to a processor bus via a bus interface unit. The cache memory is interfaced to the processing unit and the bus interface unit and the system memory coupled to bus interface unit. The processor includes a control circuit, a store queue, and a miss latch. The store queue is configured to save a first operation destined for the cache memory, and the control circuit is configured to copy the first operation to the miss latch, in response to a resource conflict preventing the store queue from accessing the cache, while maintaining the first operation in the store queue.
The store queue is suitably configured in one embodiment to save a second operation and the control circuit is configured to remove the first operation from the store queue in response to a first access of the cache memory after the resource conflict is resolved. In one embodiment, a second access of the cache memory with the second operation from the stack follows the first access. In this embodiment the first and the second accesses preferably occur in consecutive cycles of a clock signal driving the control circuit. The processor suitably further includes a mux for selecting between the miss latch and the store queue as a source for accessing the cache memory. In such an embodiment, a select input of the mux is driven by the control circuit and the control circuit is configured to select the store queue for at least a duration of the resource conflict. The control circuit is configured to select the miss latch upon detecting a cache miss resulting from an access of the cache memory from the stack.