Modern computer systems include various components that are interconnected to communicate and perform tasks. Many system implementations include one or more processors and peripheral devices that may be coupled to the processor by way of one or more interconnect levels.
As these various components may operate on common data, mechanisms to maintain a coherent view of such data may be implemented. In typical x86 computer systems, data is said to be cacheable and coherent when that data may be stored at various storage locations within the system and mechanisms are implemented to maintain a coherent view of such data. Alternately, other data may be indicated to be non-cacheable/non-coherent, meaning that this data is not cacheable and is generally owned by a single entity such that a view of the data may be maintained non-coherently.
In today's computer systems, cacheability and coherence choices are controlled through an address range approach on the processor side. On an input/output (IO) device side, however, the same need is passed to hardware on a per-request basis through a request annotation approach, which uses different semantics from those used by the processor. The inconsistency between processor side and IO side in cacheability and coherence control has undesirable ramifications for application and device driver developers in terms of both system performance and debugging.
For example, when a device issues a request packet to an interconnect, a bit in the request packet contains a hint to indicate whether this request must be maintained coherent. This bit is coded by the device driver programmer. Thus, the device programmer needs to be extremely careful about, and often makes assumptions on, the cacheability and coherence attributes of memory locations, and errors can lead to problems. For example, if a page is tagged cacheable in a processor side mechanism, but a device driver uses non-coherent memory requests, incorrect execution results may occur, unless the processor side software flushes the shared memory regions out of all caches in advance. However, this raises complexity and hinders performance. Another situation is when the IO device annotates a read request as cacheable/coherent but the processor indicates that the memory location is non-cacheable/non-coherent. Since the processor will never load this memory into the caches, snoop requests spawned by IO device-issued memory requests are meaningless. Nevertheless, these snoops consume system resources and impact performance of both processor threads and IO devices. A more serious effect will occur if the system caches coherent IO data. On such a system, the IO data is stored in the cache but will not be snooped by the processor, which can be a source of program errors.