A computer program is an ordered set or sequence of instructions to be processed, or executed, by a computer processor. The processor fetches the program instructions and places them in an instruction queue. Normally, instructions are fetched and issued sequentially, with breaks in the sequence occurring when a branch or jump instruction is encountered. The order in which the instructions are fetched is the program order.
Many modern microprocessors allow instructions to execute out-of-order from the queue. In particular, instructions are executed from the queue, out of program order, depending on, for example, (i) register dependencies and/or (ii) memory dependencies. A register dependency relates to the availability of registers required by a subject instruction. A memory dependency relates to a memory address which needs to be calculated at execution time of the subject instruction and thus is unable to be known during the time instructions are scheduled for execution.
Thus, on the one hand, the out-of-order execution of instructions improves performance because it allows more instructions to complete in the same amount of time by efficiently distributing instructions among the computing resources of the microprocessor. On the other hand, problems may occur when executing load and store instructions out-of-order.
A data cache stores data that has been recently used by a processor and is likely to be used again. When the processor executes a program instruction, it first looks for the data in the data cache. If the data is not found in the cache, the required data is retrieved from main memory and placed in the cache. The general term xe2x80x9cmemoryxe2x80x9d as used herein refers to both cache and main memory.
The terms load, load instruction and load operation instruction are used herein interchangeably and refer to instructions which cause data to be loaded, or read, from cache or main memory. Similarly, store, store instruction and store operation instruction are used interchangeably and refer to instructions which cause data to be written to memory.
When a load instruction issues before an older store instruction referencing the same address, the load may retrieve an incorrect value because the store data the load should use is not yet present at the address.
To further compound the problem of out-of-order execution of loads and stores, there may be multiple prior stores to the same address that are still pending when a load that needs to read that address issues. It can also be the case that a prior store was for a data size that is smaller than the data size of a subsequent load instruction.
The present invention addresses the above-noted problems by providing a bypass mechanism that compares the address of each load with a set of recent stores that have not yet updated memory. A match of the recent stores provides the desired load data instead of having to retrieve the data from memory.
Accordingly, in a computing system that includes an execution unit for executing load and store instructions and a data cache subsystem, a bypass method for accessing the data cache subsystem comprises (a) providing a store queue for holding issued stores, the store queue having at least a store queue entry comprising a store queue address, and (b) providing a store data buffer having at least a store data entry corresponding to the store queue entry and comprising at least a data byte. An address of an issuing load is compared against the store queue address for each store queue entry. In response to an address match between the issuing load and a particular store queue entry, the store data entry in the store data buffer that corresponds to the particular store queue entry (referred to as the xe2x80x9caddress-matching store queue entryxe2x80x9d) is passed to the execution unit when the issuing load is younger in program order than the address-matching store queue entry.
According to an aspect of the invention, each store queue entry and the issuing load includes a data size indicator. Subsequent to the data bypass, the data size indicator of the issuing load is compared against the data size indicator of the address-matching store queue entry. A trap is signaled when the data size indicator of the issuing load differs from the data size indicator of the address-matching store queue entry. The trap signal indicates that the data provided by the bypass mechanism was insufficient to satisfy the requirements of the load instruction.
According to another aspect of the invention, a physical address of the issuing load is compared against the store queue address of each store queue entry. In response to a physical address match between a first address portion of the issuing load and a particular store queue entry, a trap is signaled when there is a mismatch between a second address portion of the issuing load and the physical-address-matching store queue entry and the issuing load is younger in program order than the physical-address-matching store queue entry.
According to a further aspect of the invention, each store queue entry includes a match status indicator. An address of an issuing store is compared against the store queue address of each store queue entry. In response to an address match between the issuing store and a particular store queue entry, the match status indicator is set for the address-matching store queue entry when the issuing store is younger in program order than the address-matching store queue entry; otherwise, the match status indicator is set for the issuing store. The store data entry in the store data buffer corresponding to the address-matching store queue entry is passed to the execution unit when the issuing load is younger in program order than the address-matching store queue entry and the match status indicator for the address-matching store queue entry is not set. Subsequently, a physical address of the issuing load is compared against the store queue address of each store queue entry and in response to a physical address match between the issuing load and a particular store queue entry, a trap is signaled when the issuing load is younger in program order than the physical-address-matching store queue entry and the match status indicator for the physical-address-matching store queue entry is set.