Field of the Invention
The present invention is related to the field of computer systems. More specifically, the present invention is related to caching input/output data between main memory and direct (virtual) memory access devices.
Traditionally, control of data movement between the external devices and the main memory subsystem is typically done in either of two ways. First, data movement can be controlled by the CPU directly reading from the device (to internal CPU registers) or writing from registers to the device. This type of control is called Programmed I/O. The second type of control is with data movement being controlled, for the most part, by the external device itself. This type of control is called Direct Memory Access, or, if the device accesses memory through virtual addresses (as is the case in the preferred embodiment), Direct Virtual Memory Access (DVMA). Coordination between the external device and the CPU is typically handled either by message passing or through interrupts.
In a typical computer system configuration, the DVMA data is transferred to the memory subsystem over the same data paths which are normally used by the CPU. In particular, this includes the optional central cache. The central cache is designed as a fast access temporary buffer of either instructions or data or both. Sharing the central cache between DVMA devices and the CPU creates three effects. First, the most important, only one device can access the central cache at one time. This implies that heavy I/O data transfers through DVMA will reduce, potentially in a significant manner, the effective cache bandwidth seen by the CPU. Second, the use of cache lines to temporarily buffer DVMA data precludes their use for CPU instructions or data, resulting in lower CPU cache "hit" rates. Third, when DVMA devices write to memory through the central cache, the cache block used to buffer the DVMA data is shared with the CPU and other DVMA devices. Therefore, the possibility always exists that the cache block used by the DVMA device may be displaced from the central cache by a miss caused by an access to the same address from either the CPU or another DVMA device. In such a case, the cache block could be written back as a partially filled line. A read-then-write back approach assures that modified data in a partially filled block is properly written to main memory. A second approach to ensure that modified data is properly written is to record, with "byte marks", which bytes of data are modified within the cache block. At the time that data from the cache block is written back into main memory, only bytes indicated as modified with "byte marks" set are actually written into memory. However, this approach entails the addition of complex control logic and typically slows block write operations to memory.
Providing a separate I/O cache buffer for DVMA data may only partially correct the two problems noted above. The most significant problem posed with a separate write back I/O cache buffer is interference within the I/O cache, with multiplied devices trying to access data that maps to a single cache data buffer. This occurrence has a catastrophic effect on system performance. More specifically, the mapping algorithm used by write back caches typically uses the low order address bits of the access to determine which cache data buffer is used. This mechanism allows two distinct devices to perform operations that map to the same cache data buffer. The result is a direct conflict for the use of that cache data buffer, and performance that is potentially worse than if no cache were present. This problem can be addressed by adding associativity to the mapping algorithm for the cache, but that requires a significant increase in the cost and complexity of the cache element.
Additionally, data consistency must be maintained between the I/O and the central cache. Traditional solutions to this problem place the burden of maintaining consistency either on the operating system, which causes severe performance degradation, or on the system hardware, which also increases the cost and complexity of the cache design.
Thus, it is desirable to provide an I/O cache system to a computer system. It is further desirable that the I/O cache system facilitates a hardware and software combination solution to the data coherency problem which lessens the burden on the operating system, and yet requires minimal increases in the cost and complexity of the cache design. As will be disclosed, the I./O cache system for direct (virtual) memory access I/O devices of the present invention achieves these objects and desired results described above.