1. Technical Field
The present invention relates in general to data processing systems and in particular to managing memory access in data processing systems. Still more particularly, the present invention relates to a system, method and computer program product for preserving the ordering of read and write operations in a direct memory access system by delaying read access.
2. Description of the Related Art
A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and which generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
One aspect of design that affects cache performance and design complexity is the handling of writes initiated by the processor or by an alternate bus master. Because two copies of a particular piece of data or instruction code can exist, one in system memory and a duplicate copy in the cache, writes to either the system memory or the cache memory can result in an inconsistency between the contents of the two storage units. For example, consider the case in which the same data in both the cache memory and the system memory in association with a particular address. If the processor subsequently initiates a write cycle to store a new data item at the predetermined address, a cache write “hit” occurs and the processor proceeds to write the new data into the cache memory. Since the data is modified in the cache memory but not in the system memory, the cache memory and system memory become inconsistent. Similarly, in systems with an alternate bus master, direct memory access (DMA) write cycles to system memory by the alternate bus master modify data in system memory but not in the cache memory. Again, the data in the cache memory and system memory become inconsistent.
Inconsistency between data in the cache memory and data in system memory during processor writes can be prevented or handled by implementing one of several commonly employed techniques. In the first technique, a “write-through” cache guarantees consistency between the cache memory and system memory by writing the same data to both the cache memory and system memory. The contents of the cache memory and system memory are always identical, and so the two storage systems are always coherent. In a second technique, a “write back” cache handles processor writes by writing only to the cache memory and setting a “dirty” bit to indicate cache entries which have been altered by the processor. When “dirty” or altered cache entries are later replaced during a “cache replacement” cycle, the modified data is written back into system memory.
Inconsistency between data in the cache memory and corresponding data in system memory during a DMA write operation is handled somewhat differently. Depending upon the particular caching architecture employed, one of the variety of bus monitoring or “snooping” techniques may be used. One such technique involves the invalidation of cache entries which become “stale” or inconsistent with system memory after a DMA write to system memory occurs. Another technique involves the “write-back” to system memory of all dirty memory blocks within the cache memory prior to the actual writing of data by the alternate bus master. After the dirty memory blocks that are targeted by the DMA write is written back to the system memory, the memory blocks are invalidated in the cache, and the write by the alternate bus master may be performed.
As systems become larger and the latency required to resolve cache coherence increases, this latency can limit the bandwidth that a DMA device is able to achieve in the system. To sustain full DMA write throughput, the system must balance the amount of time to resolve cache coherence with the amount of data transferred per request. The traditional method of balancing time required to resolve cache coherence and the amount of data transferred per request is to design the system with a larger cache line size. Thus, with a larger cache line size, more data can be invalidated per cache line invalidation request. However, the major drawbacks of increasing the cache line size include trailing edge effects and the increased likelihood of false sharing of data within the larger cache lines.
Therefore, there is a need for an improved system and method of increasing the throughput capacity of DMA devices without increasing the size of the cache line within the cache memory.