The present invention relates generally to computer systems. More particularly, the invention relates to a mechanism that improves system performance by utilizing sub-cacheline transactions.
A current trend in the design of I/O systems is to use a cache in the host bridge for transferring data to and from I/O devices. The presence of one or more caches in the host bridge means that the host bridge has to participate in cache coherency actions including resolving conflicts when the same cacheline is accessed by multiple caches. For example, portions of data can be used by one device for one purpose and another portion of the same data can be used by another device for another purpose. For instance, the lower bytes of a cacheline can be used for one I/O device to control memory bus traffic whereas the upper bytes of the same cacheline can be used for another device to also control traffic to the processor bus.
A problem that often arises in a cache coherent I/O system is the increased bandwidth associated with accessing an entire cacheline from the main memory. In some applications, only a portion of the cacheline is needed, yet the entire cacheline is transmitted to or from the main memory. For example, in a system where a cacheline is 64 bytes wide, an I/O device may only need 16 bytes of the cacheline. However, when the host bridge fetches the cacheline from main memory, the entire 64 bytes is transmitted from the main memory to the host bridge. This increases the amount of bandwidth on the interconnect that connects the host bridge and the main memory and hence, decreases the overall I/O performance. Accordingly, there is a need to overcome this shortcoming.
In summary, the technology of the present invention pertains to a mechanism that allows an I/O device to obtain a portion of cacheable data that is less than the size of the cache line. In this manner, the bandwidth from the I/O bridge to the main memory is reduced along with the bandwidth of the memory subsystem thereby improving the overall performance of the I/O subsystem.
A computer system utilized herein has an I/O subsystem connected by a high speed interconnect to a memory controller unit having access to main memory. The I/O subsystem includes one or more I/O devices connected to an I/O bridge that is connected to the high speed interconnect. The I/O bridge has one or more cache units that store cacheable data which is shared within the computer system.
An I/O device can request access to cacheable data by making a DMA read or write request to its associated I/O bridge unit. The I/O bridge unit may have one or more cache units that service the DMA requests. Each cache unit includes a cache controller unit and a cache having a tag, status, and data units. Each cacheline in the data unit comprises a predetermined number of bytes (power of 2) and has an associated line in the tag and status units. A tag line includes the cache line address and the status line includes status bits indicating a number of states associated with the cacheline.
An I/O device can request a portion of data that is less than the size of the cacheline and which is herein referred to as a sub-cacheline. For illustration purposes, an exemplary cache line can be 64 bytes with each sub-cacheline being 16 bytes that are aligned on a 16 byte boundary. The status line includes valid bits for each 16 byte sub-cacheline. The 16 byte valid bits indicate whether the corresponding sub-cacheline is valid or invalid.
An I/O device can initiate a DMA read request for a sub-cacheline that is not resident in the I/O bridge from main memory. The cache controller obtains the requested sub-cacheline which is then stored in the I/O bridge unit""s cache unit. In addition an I/O device can also initiate DMA write requests for a portion of data within a sub-cacheline and one or more sub-cachelines. Other I/O devices can read or write to other sub-cachelines in the same cacheline provided that the requested sub-cachelines are valid. The 16 byte valid bits are used to track the validity of each sub-cacheline.