A computer system may be divided into three basic blocks: a central processing unit (CPU), memory, and input/output (I/O) units. These blocks are coupled to each other by a bus. An input device, such as a keyboard, mouse, stylus, analog-to-digital converter, etc., is used to input instructions and data into the computer system via an I/O unit. These instructions and data can be stored in memory. The CPU receives the data stored in the memory and processes the data as directed by a set of instructions. The results can be stored back into memory or outputted via the I/O unit to an output device, such as a printer, cathode-ray tube (CRT) display, digital-to-analog converter, etc.
The CPU receives data from memory as a result of performing load operations. Each load operation is typically initiated in response to a load instruction. The load instruction specifies an address to the location in memory at which the desired data is stored. The load instruction also usually specifies the amount of data that is desired. Using the address and the amount of data specified, the memory may be accessed and the desired data obtained.
The memory accessed in response to the load instruction may be the main system memory. Besides including a main system memory, many of today's memory systems also include a cache memory. A cache memory is a very fast local storage memory that is used by a CPU to hold copies of instructions, code or data that are frequently requested from the main memory by the CPU. Memory caches are commonly designed at two levels: a first level cache memory and a second level cache memory. Most recently, the use of third level cache memories has been discussed. The first level cache memory is usually integrated on the same integrated circuit die with the CPU, while the second and third level caches are typically integrated in separate dies, often separate chips. If the memory system includes cache memories, the cache memories are accessed before the main system memory in order to fulfill a load request.
Assuming that a computer system includes first and second level cache memories, when a load instruction is encountered, the CPU initially determines if the data resides in the first level cache. If it does (i.e., a hit), then the data is accessed and the load is completed. If it does not (i.e., a miss), then the CPU sends a request to the second level cache to determine if a copy of the data is currently being stored in the second level cache memory. If a copy of the data is contained within the second level cache memory, the data is returned to the CPU to complete the load and is stored in the first level cache memory. If a copy of the data is not present in the second level cache memory, then the memory request is sent to the main system memory to obtain the desired data. Subsequently, copies of the returned data are stored in both the first and second level cache memories.
In the prior art, many of these memory subsystems can only accommodate one load operation at a time. This is normally not a problem where there is a hit and the data is forwarded from the cache memory to complete the load operation. However, if there is a cache miss, then a bus cycle must be started to obtain data from an external source. In this case, if another access is made to the cache memory while the other memory operation is pending, the cache memory typically will not accept it, particularly where the access misses the cache. This type of cache is often referred to as a blocking cache. It is desirable to be able to access a cache memory while the cache has other memory operations pending.
Some computer systems have the capabilities to execute instructions out-of-order. In other words, the CPU in the computer system is capable of executing one instruction before a previously issued instruction. This out-of-order execution is permitted because there was no dependency between the two instructions. That is, the subsequently issued instruction does not rely on a previously issued unexecuted instruction for its resulting data or its implemented result. The CPU may also be capable of executing instructions speculatively, wherein conditional branch instructions may cause certain instructions to be fetched and issued based on a prediction of the condition. Therefore, depending on whether the CPU predicted correctly, the CPU will be either executing the correct instructions or not. Branch prediction and is relationship with speculative execution of instructions is well-known in the art. For a detailed explanation of speculative out-of-order execution, see M. Johnson, Sugerscalor Microprocessor Design, Prentice Hall, 1991. Speculative and out-of-order execution offer advantages over the prior art, including better use of resources. If multiple instructions are permitted to be executed at the same time, this performance benefit greatly increases.
Special considerations exist with respect to performing memory operations out-of-order in a computer system. Memory operations are ordered to ensure that the correct data is being transferred. For instance, if a store operation and a load operation have the same destination and source addresses respectively and the store operation precedes the load operation in the instruction stream, then the store operation must occur before the load operation to ensure the correct data will be subsequently loaded. If the load operation is allowed to be completed before the store operation, then the data loaded would more than likely not be the data that the store operation would have stored at the memory location. By using stale data, the computer system will not function as intended by the ordered sequence of instructions. However, out-of-order and concurrent execution of instructions may be very beneficial. Thus, it would be advantageous to execute memory operations out-of-order and concurrently except where their execution would create incorrect results.
The present invention provides a mechanism to perform memory operations out-of-order except where incorrect results are created. The present invention provides a mechanism to prevent the out-of-order execution of load operations until it is determined that the load operation loads data from a location to which an unexecuted store operation is directed. The present invention also provides a mechanism for loading data from an external memory when the data is not available in a local storage area.