The present invention relates to the field of computer systems and, more particularly, the present invention relates to compressing cache data transfers.
Throughout the development of computer systems, a primary emphasis has been focused on increasing the speed of such systems and their ability to handle larger and more complicated programs while reducing their cost. In order to increase the ability of a computer system, it is necessary to both increase the size of the random access memory (RAM) so its larger programs may be utilized by the computer system and to increase the speed at which access to that RAM is afforded. The straight forward method of increasing access speed is to use components which operate more quickly. However, such rapidly-operating components are more expensive than slower memory components.
With the cost involved in providing high speed RAM, advanced computer systems conventionally use a high-speed cache memory arrangement to increase the operational speed of the memory system. A cache memory arrangement provides a small portion of an especially fast memory in addition to the standard RAM. When commands are issued and data is utilized, the information is called from the RAM and stored in this cache memory. As each new read and write command is issued, the system looks to the fast memory cache to determine if the information is stored in the cache, If the information is available in the cache memory, access to the RAM if not required and the command my be processed or the data accessed much more readily. If the information is not available in the cache memory, the new data can be copied from the main memory and stored in the cache memory where it can be accessed and remain for later use by the system. In well designed memory systems, information can lie in the cache memory an average of over 90% of the time. Consequently, use of the cache memory substantially speeds the overall operation of the memory utilized in the computer system.
In order to further enhance the speed of the operation of the computer system, it has been found desirable to directly integrate a small portion of extremely rapid cache memory directly on a processor chip. For example, it may be useful to provide such a small fast cache memory consisting of 8 kilobytes of memory directly on the chip along with the other elements of a Central Processing Unit (CPU). Such an arrangement is capable of increasing the speed of the operation of the system to a great degree in the case of information used repeatedly by various processes.
Today, cache memories are commonly designed at two levels: a first level (L1) cache and a second level (L2) cache. An L1 cache is a single layer of high speed memory between a microprocessor and main system dynamic RAM (DRAM) memory. L1 caches hold copies of code and data most frequently requested by the microprocessor and are typically small ranging from 4 kilobytes to 64 kilobytes in size. L2 cache, on the other hand, is a second layer of high speed memory between the L1 cache and the main system DRAM memory. L2 caches also hold copies of code and data frequently requested by the microprocessor. The L2 cache handles the random memory requests that the L1 cache misses. In order to simplify the handling of requests that the L1 cache misses, the L2 cache typically includes all the data of the L1 cache and more. As a result, the L2 cache is almost always larger than the L1 cache ranging in size typically from 64 kilobytes to 512 kilobytes. Main memory satisfies the demands of caches and vector units and often serves as the interface for one or more peripheral devices. Most often, main memory consists of core memory or a dedicated data storage device such as a disk drive unit.
The performance of a cache is affected by the organization of the cache. Typically, there are three types of organizations that are most commonly used. These are fully associative, set associative and direct mapped (one-way set associative). In a fully associative cache memory, each item of information from the main memory system is stored as a unique cache entry. There is no relationship between the location of the information in the data cache RAM memory and its original location in the main system memory. If there are x storage locations in the cache, the cache will remember the last x main system memory locations accessed by the microprocessor. With a fully associative cache, the location of each store can hold information from any location in the main system memory. As a result, the cache requires complex tag entries (to map the complete main memory system memory space), resulting in very complex and expensive cache comparison logic. Set associative cache organizations divide the data cache RAM into banks of memory, or xe2x80x9csetsxe2x80x9d. A 2-way set associative cache divides the data cache RAM into two sets, a 4-way set associative cache into four sets, and so on. Each location in a memory page can map only to a single location in a cache way. Which location in cache is used for a given memory location is commonly determined by the system memory address modulo the size of the cache divided by the number of sets. For example, in a 2-way set associative cache memory, each location in the main system memory page can map in a location of either of the two cache set locations in the cache. When the microprocessor makes a memory request, the set associative cache compares the memory request with the tag entry at the appropriate location in each of its sets to determine if the information is in the cache (i.e., a hit). This means the cache has to do one comparison for each way, for a total number of comparisons equal to the number of sets. For example, in a 2-way set associative cache memory, the cache would only have to make two parallel comparisons to determine if the information requested is stored in the cache.
A direct mapped (1-way set associative cache organization) uses the entire data cache RAM as one bank of memory or set. Each location in any main system memory page directly maps only into a single location in the data cache RAM. Which location in cache is used for a given memory location is commonly determined by the system memory address modulo the cache size.
In prior art, a separate cache controller is used to provide access to the L2 cache. The cache controller is separate from the processor in the computer system, usually as a separate computer chip. The cache controller implements very complicated logic. Most processor systems contain two such controllers, one to control the L1 cache within the processor and the other to control the L2 cache in the system. The design of these two controllers is a compromise between performance and complexity of state that must be shared between them. The system of such hierarchical caches would provide the highest overall performance if the two cache controllers had access to information from both the cache memories and the processor and bus accesses.
Another problem with the prior art is that the L2 cache is on the system bus and access to the L2 cache is limited to the speed of the system bus. For instance, if the system bus is running at 10 MHz, an access to the L2 cache can not be performed faster than 10 HMz. It would be advantageous for the processor to be able to access the L2 cache at a rate faster than that of the system bus in order to increase the overall speed of the system. With this in mind, a single controller has been developed to control both the L1 and L2 in a system implementing both caches. Details of this controller are disclosed in copending U.S. Pat. Nos. 5,832,534 and 5,903,908 of Singh et al., both assigned to the same entity as the present application.
A problem still exists with overloaded buses transferring large amounts of information in a computer system. As signal handling requirements increase, modern computer systems can get bogged down with overloaded bus lines while the internal components compete for bus access. Thus, a need exists to reduce the traffic on commonly held bus lines in a computer system. As will be seen, the present invention fulfills this need in a simple and elegant manner.
A method and apparatus are provided for reducing the number of data and instruction transfers among components of a computer system. A sideband communication line is provided to transfer information from a source cache agent pertaining to redundant data strings occurring in a cacheline to a destination cache agent. If redundant data strings occur in a cacheline, the transfer of one or more portions of a cacheline from the source to the destination is cancelled. Redundancy logic is provided to detect occurrences of redundant data strings located in a given cacheline, generate and transfer redundancy bits when predetermined redundant data strings occur and decode redundancy bits at a destination cache agent to determine whether redundant data strings occur in subsequent portions of cachelines to be transferred. Alternative embodiments are provided of redundancy logic operating in parallel with data and instruction buses as well as redundancy logic operations occurring serially with the data and instruction buses.