Data producer-consumer techniques sometimes use a memory-based producer-consumer communication channel. When a producer and consumer do not operate synchronously, a queue may be used as a communications mechanism between them to absorb temporary differences between production and consumption. This buffering provided by the queue may be termed “elasticity”. Elasticity is needed when for example the queue contains received network packets and certain packets take longer to process than others; that is, processing is not synchronous (in “lockstep”) with arrival rate. Elasticity also addresses buffering of data. For example, large packets may require more storage space in the queue than small packets. A computer system or other system implementing a producer-consumer channel may have a memory hierarchy comprising a plurality of memories, generally of inversely proportional speed and capacity, wherein smaller and faster memories are closer in time to a memory accessor and larger and slower memories are farther away. In general the smaller and faster memories may be used to implement some sort of cache. Data migration techniques (e.g., external cache allocation (“cache push” and prefetch) may be used to move data closer to the eventual consumer, but can lose their effectiveness when the target cache is not large enough to store the pending (elastic) data until the consumer can accept it.
A producer-consumer model may be similar to that of a FIFO buffer, such as one implemented as a ring data structure in memory. A ring consists of a range of memory locations, a produce pointer (tail) used to add new items to the list, a consume pointer (head) used to identify the next valid item to remove from the list, and some communication mechanism between the producer and consumer to communicate that items have been added and removed from the ring. There may be also implicitly or explicitly some flow control mechanism. Another model is a linked list where the producer appends new entries to the tail of a list and the consumer removes entries from the head of the list. In this case, as with the FIFO buffer, there may be a mechanism for communicating when new entries have been added, but less need for flow control since the producer is limited only by the available free records to allocate and append to the list.
The producer and the consumer can each be fixed-function units communicating through a memory-based queue. Each may also be a programmable processing element such as a central processing unit (CPU), and further may have a cache used to hide memory access latency. A major performance bottleneck in using memory-based producer-consumer communication channels involving systems with caches is the cache misses taken at the consumer each time newly produced information is accessed by the consumer for the first time, also known as “compulsory cache misses”. Previously proposed mechanisms that attempt to address this include external push delivery into a cache and external prefetch hints (“Direct Cache Access” or DCA) that cause a cache to pull in data prior to the actual CPU demand for it. In the case of an external push, data is sent to the cache before it is requested by the consumer. In the case of an external prefetch hint of DCA, instead of a cache push, the cache is given a hint suggesting that it prefetch certain data before the consumer requests it. For more information, see U.S. patent application Ser. No. 10/406,798, entitled “Cache Allocation,” filed Apr. 2, 2003 (Publication No. 2004-0199727) and U.S. patent application Ser. No. 11/021,143, entitled “Software Controlled Dynamic Push Cache,” filed Dec. 22, 2004 (Publication No. 2006-0136671). These mechanisms take advantage of knowledge at the producer that the specific data will be relevant to the consumer in the near term, and strive to get data closer (that is, with lower average miss cost) to the consumer prior to the consumer being aware of, and accessing, the data.
There are a number of limitations to the current approaches. Push delivery can suffer from this elasticity problem (insufficient capacity for transient buffer growth) due to the small capacity of the lower level (closer) caches and the potential for variable processing time per packet, the upshot of which is that newly arrived data being pushed into a particular cache may displace older and more immediately relevant data that was previously pushed into that cache but is not yet processed (consumed). At the same time such “flooding” of the cache with pushes might displace the working set of other data that the processor has brought into its caches (that is, causing an increase in random victimization of cache lines). This results in increased traffic to DRAM and high miss latency when the data is finally accessed if the elasticity of the cache(s) was exceeded. Publication No. US 2003-0107389A1 describes a mechanism for cooperative throttling of push, wherein the consumer and the producer cooperatively implemented mechanisms to throttle pushing (instead spilling data to main memory or holding it in a large producer-local buffer) when the system determined by various means that push was less effective or even counter-productive.
External prefetch hints are also subject to such “cache flooding” in which prefetched data displaces previously placed data. Another limit for external prefetch hints is that a cache has limited resources to queue up pending activities. In general, a prefetch hint is given the lowest priority among pending cache requests waiting for access to the system, and prefetch hints can be safely dropped since doing so does not affect correct functionality, only performance. In order to avoid complex and counter-productive flow-control mechanisms, externally generated prefetch hints are likely to be implemented as “fire-and-forget” operations for the sender and will only be accepted by the cache if there is space in a hardware queue to hold these requests. A cache also might drop a prefetch hint that has remained unserviced for a long time due to contention with higher priority requests for resources or for some other reason. Thus, the hint might be dropped because of a lack of space in the request queue, or due to contention for cache processing cycles. Both of these effects are due to policies that are at the microarchitecture level of the cache and are independent of progress by the consumer in processing the pending list entries. This means that the likelihood of an external prefetch hint successfully being processed by the target cache is subject to microarchitecture factors unrelated to channel-level flow control elasticity. For this reason, the achievable benefit from DCA is unpredictable.