1. Field of the Invention
The present invention relates to the field of computer systems. More particularly, the present invention relates to methods employed by the computer systems to input/output data.
2. Art Background
Traditionally, data transfers between memory and an input-output (I/O) device of a computer system are accomplished in one of three ways:
1. Programmed I/O. In this case, all data transfers between the memory and the I/O device are completely controlled by the central processing unit (CPU), or more precisely, by a program executed by the CPU.
2. Interrupt I/O. In this case, all data transfers between the memory and the I/O device are initiated by the I/O devices through interrupts. In response, the CPU suspends whatever it is currently doing and attends to the needs of the I/O device.
3. Direct Memory Access (DMA). In this case, all data transfers between the memory and the I/O device are accomplished without involving the CPU.
The DMA approach provides a much faster way of moving data between the memory and the I/O device. Typically, a direct memory access controller is employed. Upon request of the I/O device, the DMA controller suppresses the CPU, takes control of the system bus and causes data to be transferred between the memory and the I/O device. The source/destination memory locations and the amount of data transferred are controlled by the starting address and word count provided to the DMA controller.
Most DMA controllers are not involved in the actual transfer of the data. Data are transferred directly between the memory and the I/O device under the control of the DMA controller. The DMA controller, the CPU, the memory and the I/O device are all coupled to a system bus. Alternatively, some DMA controllers are involved in the actual transfer of data. After taking control of the system bus, the DMA controller causes data to be transferred from the memory (or the I/O device) to itself and then re-transmits the data to the I/O device (or the memory). In that case, the I/O device is coupled to the DMA controller instead of the system bus. By involving the DMA controller in the actual data transfer, the DMA controller gains the ability to interpret or process the data being transferred. However, the DMA controller pays for the added ability by sacrificing performance. For either type of DMA controllers, by duplicating the internal logic of the DMA controller, a DMA controller may support multiple I/O devices.
The memory typically operates at a substantially slower speed than CPU. In order to avoid having the CPU idle too often while waiting for data or instructions from the memory, a cache which can operate at a higher speed than the memory is often used to buffer the data and the instructions between the memory and the CPU. At any particular point in time, a subset of the instructions and data stored in memory is stored in the cache. Thus, if data are allowed to be transferred into memory directly from an I/O device under the control of a DMA controller, data stored in cache may be outdated. Therefore, the CPU must be protected from getting stale data from cache.
A simple and the most common solution to the problem is to limit direct data transfer between memory and I/O devices to a designated area of memory only, commonly referred as the I/O address space. The data in the I/O address space are not cached, thereby eliminating the possibility of the CPU getting staled data from the cache. The CPU reads/writes I/O data directly from/to the I/O address space using special I/O instructions supported by the CPU. Alternatively, a cache coherency mechanism must be provided to ensure the cached data are current. However, cache coherency mechanisms are expensive and they are often not cost effective for uniprocessor computer systems.
A similar problem exists on multiprocessor computer systems where each processor has its own private cache and shares a common memory. Since the current copy of data in a processor's private cache may be outdated by modification made by another processor, the processors must be protected from getting stale data from its private cache. However, the simple solution of not caching shared data among the processors is not an acceptable solution, since it would substantially undermine the performance gained by having multiple processors in the system. A cache coherency mechanism, although costly, is nevertheless provided to ensure the data in all the private caches are concurrent. Typically, it involves snooping operations to be performed on the private caches to check if copies of the altered data are being maintained in the other private caches. Detected copies are invalidated. Thus, it is desirable if the cache coherency mechanism provided on these multiprocessor computer systems can be exploited to continue to allow data to be transferred directly between memory and I/0 devices and at the same time be cacheable.
As will be disclosed, these objects and desired results are among the objects and desired results achieved by the present invention that provide a method and apparatus for transferring cacheable I/O data between a memory and an I/O device and maintaining cache coherency.