A typical information handling system is depicted in FIG. 1 and includes at least one central processing unit (CPU) 10 with one or more integrated data transfer units 11, one or more levels of cache memory 13, and a number of other units such as one or more execution units (not shown) or a memory management unit (not shown). CPU 10 is interconnected via system bus 12 to random access memory (RAM) 14, read only memory (ROM) 16, and input/output (I/O) adapter 18 for connecting peripheral devices such as disc units 20 and tape drives 40 to bus 12, user interface adapter 22 for connecting keyboard 24, mouse 26, speaker 28, microphone 32, and/or other user interface devices such as a touch screen device (not shown) to bus 12, communication adapter 34 for connecting the information handling system to a data processing network, and display adapter 36 for connecting bus 12 to display device 38.
Cache memory 13 is one part of the overall concept of storage hierarchies where faster but less dense memories are placed closer to CPU 10, and slower but more dense or bigger memories back them up. Cache memory 13 may have one or more levels of storage hierarchy wherein the most frequently requested data by CPU 10 is stored in a first level of cache memory, the next most frequently requested data is stored in a second level of cache memory and the least frequently requested data is stored in a main memory such as RAM 14 or ROM 16 located outside of CPU 10. When CPU 10 needs to make a request for data from memory, data transfer units 11 search cache memory 13 for the address of the desired data. If the data is available a hit occurs and the data is transferred. If the data is not in cache memory 13, then a miss or cache fault occurs. If cache memory 13 has multiple levels each level is searched one at a time until a hit occurs.
Referring now to FIG. 2, a portion of a state of the art central processing unit (CPU) 100 is depicted. Reference numerals used in FIG. 2 which are like or similar to the reference numerals used in FIG. 1 are intended to indicate like or similar components. CPU 100 includes data transfer units 111a, 111b, a level one cache 113a, and a level two cache 113b. Data transfer units 111a and 111b are operably associated with level one cache 113a for transferring data within CPU 100. Data transfer units 111a, 111b include circuitry for requesting data to be transferred by data transfer units 111a, 111b from level one cache 113a. Data transfer units 111a, 111b are preferably load units; however, data transfer units 111a, 111b may be load/store units or store units. Level one cache 113a stores a first set of data for use by CPU 100, and level two cache 113b stores a second set of data for use by CPU 100. Level ore cache 113a and level two cache 113b include addressable memory locations or cache lines such that data is stored in a memory location or a cache line within level one cache 113a and level 2 cache 113b having a unique address. CPU 100 further includes miss queues 150a, 150b. Miss queue 150a is operably coupled to data transfer unit 111a, and miss queue 150b is operably coupled to data transfer unit 111b. Miss queues 150a, 150b are also operably coupled to level two cache 113b.
CPU 100 further includes a data output port 152a operably associated with data transfer unit 111a and data output port 152b operably associated with data transfer unit 111b. Data output port 152a outputs data to be transferred by data transfer unit 111a from level one cache 113a when the data requested by data transfer unit 111a is located in level one cache 113a. In a like manner, data output port 152b outputs data to be transferred by data transfer unit 111b from level one cache 113a when the data requested by data transfer unit 111b is located in level one cache 113a. Data output ports 152a, 152b may be connected to other units within CPU 100, such as general purpose registers (GPRs), for transferring the data to the other units. CPU 100 further includes a data formatter or rotater 154a operably associated with data output port 152a and data transfer unit 111a for formatting the data prior to transfer by data transfer unit 111a. A data formatter or rotater 154b is operably associated with data output port 152b and data transfer unit 111b for formatting the data prior to transfer by the data transfer unit 111b. Whereas, the data transferred by data transfer units 111a or 111b from level one cache 113a is an entire cache line, the data outputted through data output ports 152a or 152b is a specific operand or operands from a cache line extracted by data formatters 154a or 154b. In other words, data formatters 154a or 154b are hardware mechanisms used to extract a desired set of data bytes from a potentially larger set of data bytes, such as are found in a cache line.
CPU 100 further includes a data output port 156 operably associated with level two cache 113b for outputting data to be transferred by one of data transfer units 111a, 111b from level two cache 113b when the data requested by one of the data transfer units 111a, 111b is located in level two cache 113b. Level two cache 113b outputs data through a level two cache output 157. Level two cache 113b outputs an entire cache line of data. A data formatter or rotater 158 is operably associated with data output port 156 and output 157 of level two cache 113b for formatting the data prior to outputting the data through data output port 156. Data formatter 158 extracts the specific data needed from the cache line outputted by level two cache 113b for forwarding to other units in CPU 100 via data output port 156. The data is outputted from level two cache 113b and data output port 156 without further interaction with data transfer units 111a, 111b. CPU 100 includes a reload latch 160 operably coupled to level one cache 113a and level two cache 113b for storing entire cache lines of data outputted from level two cache 113b so that such data can be transferred by a data bus 164 and stored in level one cache 113a to thereby update level one cache 113a to avoid subsequent cache misses.
In operation, when multiple operations or requests for data from data transfer units 111a, 111b to a single cache line in level one cache 113a miss the cache, or in other words, the address for the data is not found in level one cache 113a, the requests for such data are sent to the next hierarchical memory level, level two cache 113b, of CPU 100. Requests for data from data transfer units 111a, 111b wherein the data was not located in level one cache 113a are stored in miss queues 150a, 150b. Requests for data stored in miss queues 150a, 150b are forwarded to level two cache 113b one at a time, wherein the requested data address is searched for to determine whether level two cache 113b contains such address, thus producing a hit which can be forwarded to rotater 158 and outputted through data output port 156. If there are successive requests for data in the same memory location or cache line of level two cache 113b, delays in returning data to the remaining units of CPU 100 via data output ports 152a, 152b, or 156 occur when the data is not located in level one cache 113a, and thus, multiple cache misses occur. A bottleneck occurs when multiple cache misses to a single cache line in level one cache 113a occur before level one cache 113a receives a copy of the data from level two cache 113b or from other lower hierarchical memory levels. Multiple requests for the same address of data in a single cache line may occur if a "do" loop is used. Each request for the same address of data stored in miss queues 150a, 150b must access the data from level two cache 113b, assuming the data is located in level two cache 113b, and output the data via data output port 156 for each request one at a time.
What is needed is an apparatus and method for recognizing that multiple requests for the same cache line are stored in the miss queue such that only one access to the level two cache is needed to retrieve the data and such that the data can be outputted and forwarded to other units of the CPU in parallel or more than one at a time.