1. Technical Field
This invention relates generally to transactions, such as memory requests and their responses, and more particularly to the temporary storage of remotely received transactions that relate to resources to which other transactions already being processed also relate, and thus that conflict with these other transactions.
2. Description of the Prior Art
There are many different types of multi-processor computer systems. A Symmetric Multi-Processor (SMP) system includes a number of processors that share a common memory. SMP systems provide scalability for multithreaded applications and allow multiple threads to run simultaneously. As needs dictate, additional processors, memory or input/output (I/O) resources can be added. SMP systems usually range from two to 128 or more processors. One processor generally boots the system and loads the SMP operating system, which brings the other processors online. Without partitioning, there is only one instance of the operating system in memory. Since all processors access the same memory, sharing of data can be accomplished by simply placing the data in memory. The operating system uses the processors as a pool of processing resources, all executing simultaneously, where each processor either processes data or is in an idle loop waiting to perform a task. SMP system throughput increases whenever processes can be overlapped until all processors are fully utilized.
A Massively Parallel Processor (MPP) system can use thousands or more processors. MPP systems use a different programming paradigm than more common SMP systems. In an MPP system, each processor contains its own memory and copy of the operating system and application. Each subsystem communicates with the others through a high-speed interconnect. To use an MPP system effectively, an information-processing problem should be breakable into pieces that can be solved simultaneously. The problem should be broken down with nodes explicitly communicating shared information via a message-passing interface over the interconnect. For example, in scientific environments, certain simulations and mathematical problems can be split apart and each part processed at the same time.
A Non-Uniform Memory Access (NUMA) system is a multi-processing system in which memory is separated into distinct banks. NUMA systems are a type of SMP systems. In Uniform Memory Access (UMA)-SMP systems, all processors access a common memory at the same speed. NUMA systems are usually broken up into nodes containing one to eight, or more, processors. The nodes typically also contain a portion of the global memory. The memory local to a node typically is closer in physical and logical proximity, and thus is accessed faster than memory in more distant parts of the system. That is, local memory is accessed faster than distant shared memory. NUMA systems generally scale better to higher numbers of processors than UMA-SMP systems, due to the distribution of memory causing less contention in the memory controller.
Multi-processor systems usually include one or more coherency controllers to manage memory transactions from the various processors and I/O. Transactions are requests or responses relative to memory or another type of resource. For instance, transactions may be requests to read or write data to memory or another type of resource, or may be responses issued after the requests have been processed. The coherency controllers negotiate multiple read and write requests emanating from the processors or I/O, and also negotiate the responses back to these processors or I/O. Usually, a coherency controller includes a pipeline, in which transactions, such as requests and responses, are input, and actions that can be performed relative to the memory for which the controller is responsible are output. Transaction conversion is commonly performed in a single stage of a pipeline, such that transaction conversion to performable actions is performed in one step.
Transactions may be remote, in that they originate from nodes other than the node that is to process the transactions. The transactions are thus received by the processing node that is to process the transactions, from originating nodes that originated the transactions. If such transactions relate to resources of the processing node, such as the memory of this node, that are related to other transactions that are already currently being processed, then the processing node sends retry responses to the originating nodes. A retry response indicates to an originating node that it is to retry the transaction at a later time. This approach for handling conflict transactions is disadvantageous, however. It can cause undue bandwidth consumption on the interconnect that connects the nodes and adds to the latency of the retried transaction.
Furthermore, the approach may be unfair to the originating nodes. An originating node may have a high priority transaction, for instance, that keeps getting bounced back with a retry request from the processing node. This may be because other transactions relating to the same resources fortuitously are being processed by the processing node each time the high priority transaction is sent by the originating node. For these and other reasons, therefore, there is a need for the present invention.