1. Field of the Invention
This invention is related to the field of processors, cache coherent communication among processors, and the use of retry in cache coherent communications.
2. Description of the Related Art
Processors are typically included in systems with other components and are configured to communicate with the other components via an interconnect on which the processor is designed to communicate. The other components may be directly connected to the interconnect, or may be indirectly connected through other components. For example, many systems include an input/output (I/O) bridge connecting I/O components to the interface.
Typically, the processor includes an interface unit designed to communicate on the interconnect on behalf of the processor core. The processor core generates requests to be transmitted on the interconnect, such as read and write requests to satisfy load and store operations and instruction fetch requests. Additionally, most processors implement caches to store recently fetched instructions/data, and implement cache coherency to ensure coherent access by processors and other components even though cached (and possible modified) copies of blocks of memory exist. Such processors receive coherency related requests from the interconnect (e.g. snoop requests to determine the state of a cache block and to cause a change in state of the cache block). Other components may also implement caching and/or cache coherent communication.
A problem arises in such systems when a given cache block is being shared by two or more processors or other devices, especially if memory latencies are long (which is typically the case). A first processor/device initiates a transaction to read the block, for example. Then, a second processor/device initiates a transaction to read the same block before the first processor/device receives the block from memory.
In some systems, the first processor/device responds to the second processor/device's transaction, indicating that it will provide the block (after it receives the block from memory). The second processor/device records a “link” to the first processor/device to remember that the first processor/device will be providing the data. If multiple devices make such requests, a linked list of promises to provide the data is formed. An inefficient amount of storage may be needed across the devices to store the linked list state. Additionally, ensuring that such a system functions properly without deadlock or loss of coherency is complicated.
In other systems, transactions can be “retried” to be reattempted at a later time. However, with long memory latencies and many devices attempting to share a block, a large number of transactions may be initiated, only to be retried. The same device may initiate its transaction repeatedly, only to be retried. Bandwidth consumed by such transactions is wasted, and power consumption may be increased as well even though no useful work occurs as a result of the retried transactions.