1. Field of the Invention
The present invention generally relates to memory operations of processors, and more particularly, to a processor and a method for executing a load operation and a store operation of the processor.
2. Description of Related Art
Most current processors generally adopt an instruction pipeline architecture to increase the performance of the processors. In order to reduce the time for obtaining data from a memory, one such processor typically includes a data cache for temporary storage of data that is read from the memory. The data cache is divided into a data RAM and a tag RAM. There are generally two types of memory operations, i.e., a load or read operation and a store or write operation. During the load operation, the data RAM and tag RAM can be read simultaneously. The read data is directly used if the result of a tag comparison is cache hit and discarded if the result of the tag comparison is cache miss. On the other hand, during the store operation, the tag RAM must be first read to compare the tag with store address. The data is stored in the data RAM only if the comparison result is cache hit.
Due to the above difference, the time for executing the load operation is less than the time for executing the store operation. When a store operation is followed by a load operation, read/write competition may occur in the instruction pipeline in which both the load operation and store operation attempt to concurrently access the data RAM. At this time, if the load operation waits until the store operation completes, stall of the load operation occurs which decreases the processing efficiency of the instruction pipeline.
To address the stall problem, U.S. Pat. No. 6,434,665 discloses a store buffer for temporary storage of parameters such as address and data of a store operation. As such, in case a read/write competition occurs in the data cache, the load operation can be executed before the data stored in the store buffer is written into the data cache. However, this method is only limited to be used when there is no memory dependency between the load operation and the store operation. That is, this method is only adapted to the situation where the address to be read in the load operation does not overlap with the address to be written in the store operation. When there is the above memory dependency, in order to read correct data, the load operation still must wait until the store operation completes and, therefore, the stall problem still exists.
To further solve the stall problem, U.S. Pat. No. 6,141,747 proposes another method. In this method, the data in the store buffer is directly forwarded to the load operation in case a read/write competition occurs and there is a memory dependency between the load operation and the store operation. As such, the load operation does not have to wait until the data is written into the data cache. In this method, the data is stored in the store buffer in words of multiple bytes. However, each piece of data is not necessarily a whole word or whole words. For example, the data may be half-word data or only one byte of the data is valid data. If the data to be used in the load operation is distributed in multiple entries of the store buffer, a complex assembling mechanism is required to assemble the scattered data in the multiple entries to form the data to be forwarded to the load operation. If the store buffer cannot provide the complete data required by the load operation, data parts in the store buffer need to be written into the data cache and then the data can be read from the data cache in the load operation, which also causes a stall problem in the instruction pipeline.