1. Technical Field
The present invention relates to pre-fetch operations in multiprocessor systems, and more particularly to data subscribe-and-publish mechanisms and methods for producer-consumer pre-fetch communications.
2. Discussion of Related Art
Pre-fetch operations have been widely used in modern computer systems to hide memory access latencies. In a multiprocessor system, a pre-fetch operation can be a vertical pre-fetch or a horizontal pre-fetch, also referred to as producer-consumer pre-fetch. A vertical pre-fetch can retrieve data from a high-level cache such as an L3 (level 3) cache to a low-level cache such as an L2 (level 2) cache. A horizontal pre-fetch can retrieve data from a cache in a producer node to a cache in a consumer node. A pre-fetch operation may involve both vertical and horizontal operations. In an SMP (symmetric multiprocessor) system, for example, a processor can issue a pre-fetch operation that retrieves data from the L2 cache in another processing node to the L1 (level 1) cache associated with the processor.
Pre-fetch operations can be invoked by software or by hardware. For example, the PowerPC® architecture comprises DCBT (data cache block touch) and DCBTST (data cache block touch for store) instructions that allow software to invoke pre-fetch operations at appropriate times. The IBM® POWER4 system comprises a data streaming pre-fetch mechanism that can, when a streaming access patterns is detected, retrieve data from the memory to an L3 cache, from the L3 cache to an L2 cache, and from the L2 cache to an L1 cache.
Pre-fetch operations in multiprocessor systems are subject to various inefficiencies due to a lack of coordination between a producer that generates data and a consumer that uses the data. In multiprocessor systems, pre-fetch operations can be consumer-initiated or producer-initiated, also referred to as pre-send or cache injection. The underlying cache coherence protocol may be made more complicated to deal with data pre-send operations, because a cache can receive data without a pending data request.
Regarding a consumer-initiated pre-fetch operation, if the consumer does not know when the data is to be produced at the producer side, it can be difficult for the consumer to invoke the data pre-fetch operation at an appropriate time. For example, if the consumer invokes the pre-fetch operation before the data is produced, the consumer would obtain a stale copy of the data. The stale copy of the data needs to be invalidated, resulting in extra cache coherence overhead.
Regarding a producer-initiated pre-fetch operation, if the producer does not know where the potential consumer resides, it can be difficult for the producer to determine, when newly produced data becomes available, where the data should be sent to. Even assuming that the producer knows where the consumer is, the producer may not know when it should send the data. If the data sent to a consumer is not the final data that is to be produced, for example, the data is not useful to the consumer and needs to be invalidated at the consumer side. Further, if the producer sends the data too early, the cache line holding the data at the consumer side can be replaced before the data is used by the consumer.
Therefore, a need exists for an effective data pre-fetch mechanism to support producer-consumer pre-fetch communications.