The present invention generally relates to the field of multi-processor environments. More particularly, the present invention relates to the use of an improved cache line directory entry to increase overall performance in a multi-processor environment.
In a multi-processor environment, processors and other devices, for example, I/O devices, are interconnected via a bus and each device may include a cache memory. Each device that contains a cache memory is referred to as a caching agent or a node. Cache memory is typically faster than main memory, therefore, caches are often used to improve overall computer system performance.
In a multi-processor environment, several processors or I/O devices may each contain a xe2x80x9ccopyxe2x80x9d of a particular cache line of data. Because several caching agents may each contain a copy of a particular cache line of data simultaneously, a system is required to control revisions to the data. In a multi-processor environment, typically only one xe2x80x9ccopyxe2x80x9d can be a read-write copy at any one time, all other copies being read-only copies. Typically, if one caching agent requests a read-write copy of data, all other caching agents give up their copies of the cache line. In this manner, revisions to the cache lines are controlled. This system of controlling revisions to cache lines is called cache coherency and requires communication over the processor bus. Therefore, maintaining cache coherency creates processor bus traffic, which may slow down the system.
Caching agents may request a read-only copy of a cache line or a read-write copy of a cache line. Some caching agents rarely write to the cache line, yet still request the cache line with read-write permission. When a caching agent requests a cache line with read-write permission, but does not write data to the cache line, many useless cache coherency operations may occur over the processor bus. For example, if an I/O device, such as a small computer system interface (SCSI) bus, or an Ethernet network interface card (NIC), requests a read-write copy of a cache line of data, the I/O device will receive the cache line with read-write permission and all other devices with a copy of that cache line will be forced to release their copy of the cache line. However, such an I/O device rarely writes to a cache line, rather, it typically sends the data to another device for viewing purposes. This creates useless cache coherency operations when caching agents are forced to release their copy of the cache line.
Cache based computer systems lack explicit software controlled mechanisms that would enable software to modify cache line requests in order to improve overall system performance. Therefore, such a mechanism would be very desirable.
The present invention adds additional bits, referred to herein as affinity bits, to a cache line directory entry for more efficient use of the bus that interconnects processors, I/O devices, and memory devices in a multi-processor environment. Specifically, the improved cache line directory entry includes at least one additional processor affinity bit and/or at least one I/O affinity bit. The processor affinity and I/O affinity bits are used to intelligently modify requests for cache lines in accordance with a predefined xe2x80x9caffinityxe2x80x9d for certain types of requests, creating more efficient bus usage by minimizing cache coherency operations.
In accordance with an aspect of the present invention, if an I/O affinity bit is in a xe2x80x9cread-onlyxe2x80x9d affinity state, and an I/O device requests a read-write copy of a cache line of data, the read-write request will be converted to a read-only request. In this case, there is an affinity toward the read-only request. Since the I/O device receives a read-only copy of the cache line, no further bus actions are required to maintain cache coherency.
In accordance with another aspect of the present invention, if a processor affinity bit is in a xe2x80x9cread-writexe2x80x9d affinity state, and a processor requests a read-only copy of a cache line of data, the read-only request will be converted to a read-write request. In this case there is an affinity toward the read-write request. Since the processor is likely to modify the cache line and the processor receives a read-write copy of the cache line, this minimizes the actions required for cache coherency.
Alternatively, if a particular processor typically performs only read instructions, rather than write instructions, a processor affinity bit may be set to the read-only affinity state, triggering a conversion from a read-write request to a read-only request.
Further, according to the present invention, a multi-processor system may be xe2x80x9ctunedxe2x80x9d by setting the affinity bits to achieve improved bus performance by minimizing cache coherency operations. Also, one or more affinity bits may be provided. For example, one affinity bit may be provided for each processor in the multi-processor system. Software may be provided to modify the affinity bits and/or to modify a cache line directory entry to facilitate system recovery from a failed caching agent.
Other features and advantages of the present invention will become evident hereinafter.