Most of today's computer systems utilize various types of cache memory systems to improve system performance. A data cache is a small high speed memory device that temporarily holds data needed by the central processing unit (CPU) and other system devices. By anticipating data requirements, collecting data ahead of time (i.e., pre-fetching), and storing the data in a data cache, the time consuming step of retrieving the data from the computer's main memory "on the fly" is often eliminated.
However, in order for a data cache to be useful (and therefore avoid the latencies caused during data retrievals from large main memory), the cache must first be efficiently loaded with the data needed by the CPU. The process of loading data into a cache is performed by the computer's line fill control logic and is termed "filling" a line. By this process, a cache is loaded, line by line, with data from either the main memory, or a higher level cache for use by a specific processor or set of processors attached to this cache. Each line of data contains a predetermined number of bytes of information (e.g., 128 bytes). During a line fill operation, each line of data can be broken up into smaller "data packets" of a predetermined size (e.g., quad-words consisting of 16 bytes or oct-words consisting of 32 bytes).
If the data needed by the CPU is not in the data cache, a "cache miss" occurs and the computer system is then forced to look in main memory for the data. Each time a cache miss occurs, the system must allocate a system resource such as a cache miss (Cmiss) sequencer to track the retrieval of the data from main memory. Because a typical computer system has a limited number of such resources allocated to track and retrieve data from main memory, the CPU is often forced to wait until one is available when a subsequent cache miss occurs.
Cmiss sequencers are a limited resource, and when all of them are tied up for line fill operations, processor performance is directly affected, as the processor has to stall when it cannot get a free Cmiss sequencer when it requires one. Each Cmiss sequencer consists of a significant amount of hardware responsible for tracking an outstanding cache miss, and most processor implementations utilizing such systems therefore have a limited number of such sequencers (e.g., three). Therefore, efficient use of the Cmiss sequencers is generally considered to be a key to achieving a high level of processor performance.
Therefore, without some means for more efficiently delivering data to an L1 Dcache from an intermediate cache that is receiving data at different speeds from different devices, computer system performance will be negatively impacted.