1. Field of the Invention
The present invention relates generally to a system and method for improving the efficiency of programmed input/output (PI/O) and polling of input/output (I/O) interfaces in a system with large direct-mapped data caches. More particularly, it relates to such a system and method which does not require the use of explicit cache management instructions. Most especially, the invention relates to such a system and method which combines use of direct-mapped caches, a large number of cache lines, high cache miss penalties relative to instruction times, and a lack of direct memory access I/O.
2. Description of the Prior Art
There are several ways to execute I/O operations in a computer system. One which is often attractive is called "memory-mapped" I/O, where I/O device registers appear in the same physical address space as main memory, and may thus be accessed via normal load/store instructions. Memory mapped I/O devices typically decode physical memory addresses and respond to addresses in specific ranges.
In processors with data caches, one problem with this approach is that the goal of the cache, which is to suppress references to main memory, conflicts with the goal of instructions used to access the I/O device registers, which is to cause an I/O access for every load or store instruction. Another way of stating this problem is that software which is polling an I/O device register must guarantee that the polled address is not valid in the data cache, or the software will not see the actual register value.
Typical ways of dealing with this problem are:
Non-cached regions of physical address space for I/O device registers; the cache is disabled. PA1 Explicit cache management operations where the I/O software can ask that a particular cache line be invalidated, possibly causing a write-back. PA1 Indirect cache management instructions useful with direct-mapped caches, where the software generates a reference to a region of the physical address space known to collide with the cache line being "managed," thus causing the line to be invalidated. This other region can be called a "reserved" region, although it might be used independently for normal memory.
A current trend in processor design is changing several system parameters. Cache lines are getting larger. Next generation systems may have a 256 byte second-level cache line. This implies the use of write-back rather than write-through caches. Memory latencies are getting longer in relation to instruction rate. The cache refill time on the next generation systems might take as long as 200 instruction cycles.
These changes affect the performance of traditional means of dealing with the memory mapped I/O problem. Using uncached addresses is simple, but because it generates a cache miss for every I/O instruction, bandwidth for programmed I/O (PI/O) data transfer is reduced to a tiny fraction of the memory system bandwidth. In the next generation systems, this fraction might be 1/32 of the basic bandwidth.
Explicit cache management instructions can provide accurate control over the disposition of cache lines, but create some additional complexity in the central processing unit (CPU) and cache implementations, and are not present in all architectures. Implicit cache management suffers from high latencies because, in general, it requires a reference to the reserved region for each reference to an I/O register. It thus requires two cache misses and refills per I/O reference. One can do better for PI/O data transfer by making the I/O device's data buffer register as wide as a cache line. Then, almost half of the memory system bandwidth is available for data transfer. The other half is still used for refilling from the reserved region. It is clear from this discussion that improvement is required in the traditional means of dealing with memory mapped I/O for use in next generation computer systems.