Multi-processor computer systems include a number of processing nodes connected together by an interconnection network. Typically, each processing node includes one or more processors, a local memory, and an interface circuit connecting the node to the interconnection network. The interconnection network is used for transmitting packets of information between processing nodes.
Distributed, shared-memory multiprocessor systems include a number of processing nodes that share a distributed memory element. By increasing the number of processing nodes, or the number of processors within each node, such systems can often be scaled to handle increased demand. In such a system, the processors may include one or more scalar processing units. These scalar processing units help control loading data from, and storing data to, addressable memory space in the distributed-memory system.
In order to load and store data, scalar processing units need to identify the proper address space for the data. In the past, individual nodes often have not had efficient or robust address identification mechanisms. In addition, such nodes often have not been able to handle multiple memory requests effectively (when scaling to large system size) or interface well with local cache requests and allocation.
Therefore, there is a need for a processing unit that addresses these and other shortcomings.