1. Technical Field
The present invention relates to direct memory access (DMA) in general, and, in particular, to a method and apparatus for performing DMA write operations. Still more particularly, the present invention relates to a method and apparatus for invalidating cache lines during a DMA Write operation by a Peripheral Component Interconnect device.
2. Description of Related Art
Peripheral Component Interconnect (PCI) bus technology uses memory mapping techniques for performing input/output (I/O) operations and DMA operations. In a data processing system that is capable of handling PCI devices, a range of addresses called PCI address space is allocated within a system memory for all the PCI devices associated with the data processing system. Within the PCI address space, there is a region reserved by the operating system for programmable I/O operations that are performed by a processor to read or change the contents of PCI device registers within the PCI devices. In addition, a separate region is allocated within the PCI address space by the operating system for DMA accesses to the system memory by the PCI devices. The allocated addresses are dynamically mapped to a section of the system memory. Each of the PCI device can use the mapped addresses to perform DMA Read or Write operations by directly reading and writing in the PCI address space with the mapped addresses.
DMA Write operations from each of the PCI devices must be performed in a specific order as observed by any potential data consumer within the data processing system. Because the location of a DMA Write operation may have data that are shared by a cache memory of a processor, in order to maintain correct ordering, a DMA Write operation must invalidate any copies of the data from all cache memories within the data processing system before completing the DMA Write operation. In fact, the completion must occur before any subsequent DMA Write operations from the same PCI device can become visible to any data consumer; otherwise, ordering rules will be violated.
In order to sustain full DMA Write throughput, the data processing system must balance between the amount of time to resolve cache coherence and the amount of data transferred per DMA Write request. Typically, as a data processing system becomes larger, the time required to resolve cache coherence also increases, which effectively limits the bandwidth that a PCI device is able to achieve in the data processing system. One method of improving the bandwidth is to design the data processing system with a longer cache line such that more data can be invalidated per cache line invalidation request. However, there are also drawbacks associated with a relatively long cache line length. For example, an entire cache line's worth of data needs to be transferred even when only a small portion of the cache line contains the required data, which effectively leads to a reduction in bus bandwidth. A longer cache line length also increases the likelihood of false sharing of data within the same cache line by multiple processors.
The present disclosure provides an improved method and apparatus for invalidating cache lines during a DMA Write operation by a PCI device.