1. Technical Field
The present invention relates to systems and methods for bus transactions, and in particular, to systems and methods for handling bus transactions to a memory system.
2. Background Art
Modern high-performance processors are designed to execute multiple instructions on each clock cycle. To this end, they typically include extensive execution resources to facilitate parallel processing of the instructions. To be effective, the execution resources must receive data and instructions at a rate comparable to the rate at which the processor executes instructions. It is the function of the memory system to keep the processor""s execution resources supplied with data and instructions.
The memory system typically includes a hierarchy of caches, e.g. L0, L1, L2 . . . , and a main memory. The storage capacities of the caches generally increase from L0 to L2, et seq., as does the time required by succeeding caches in the hierarchy to return data to the processor. For example, an L0 cache may return data in 1 or 2 clock cycles, an L1 cache may return data in 4 to 8 clock cycles, and an L2 cache may return data in 10 or more clock cycles. A data request propagates through the cache hierarchy, beginning with the smallest, fastest structure, until the data is located or the caches are exhausted. In the latter case, the requested data is returned from main memory. The latency for a transaction to main memory can be on the order of 100-200 clock cycles.
Even with the best system designs, cache misses occur and data must be retrieved from main memory. The significant penalty for cache misses (100-200 clock cycles) places a premium on handling transaction to main memory efficiently. For example, when a request to load an operand misses in a cache, the operand is typically returned to the cache along with data from adjacent memory addresses. Enough data is returned to fill one or more xe2x80x9clinesxe2x80x9d of the cache, i.e. one or more cache lines. The spatially local nature of most programs means that the data from adjacent memory addresses is likely to be requested as well. If the data from an adjacent address is requested before the cache line is returned, multiple bus transactions may be generated to the same cache line in memory.
Where there are multiple misses to a single cache line, it is inefficient to generate a separate cache request for each miss. Separate requests consume bandwidth on the memory bus, and if they target the same location in memory, they can not be pipelined without adding transaction management hardware. On the other hand, combining multiple cache misses that target data in the same cache line into a single bus transaction places demands on the bus controller. For example, the target bytes within the cache line need to be identified along with the register for which they are destinted. The signal lines necessary to characterize the requests for the bus controller consume area on the silicon die, and provide additional constraints on signal routing.
The present invention addresses these and other problems associated with combining multiple data requests into a single bus transaction.
A system and method are provided for efficiently processing a bus transaction generated by a cache miss.
On a cache miss, information on the data request that missed in the cache is mapped to a bus transaction. The information is stored and a pointer to the stored information is forwarded to a bus controller for servicing by the mapped bus transaction.
For one embodiment of the invention, the data request information is provided to a load miss buffer for storage and to secondary miss system for mapping. The secondary miss system provides the bus controller with a pointer to the request information. The pointer is stored in an external bus logic buffer, at an entry associated with the mapped bus transaction.