This invention relates generally to multi-processor computer systems, and more particularly to such systems in which there are a number of building blocks and each building block has one or more agents.
There are many different types of multi-processor computer systems. A symmetric multi-processor (SMP) system includes a number of processors that share a common memory. SMP systems provide scalability. As needs dictate, additional processors can be added. SMP systems usually range from two to thirty-two or more processors. One processor generally boots the system and loads the SMP operating system, which brings the other processors online. Without partitioning, there is only one instance of the operating system and one coherent shared memory space. The operating system uses the processors as a pool of processing resources, all executing simultaneously, where each processor either processes data or is in an idle loop waiting to perform a task. SMP systems increase in speed whenever processes can be overlapped.
A massively parallel processor (MPP) system can use thousands or more processors. MPP systems use a different programming paradigm than the more common SMP systems. In an MPP system, each processor contains its own memory and copy of the operating system and application. Each subsystem communicates with the others through a high-speed interconnect. To use an MPP system effectively, an information-processing problem should be breakable into pieces that can be solved simultaneously. For example, in scientific environments, certain simulations and mathematical problems can be split apart and each part processed at the same time.
A non-uniform memory access (NUMA) system is a multi-processing system in which memory is separated into distinct banks. NUMA systems are similar to SMP systems. In SMP systems, however, all processors access a common memory at the same speed. By comparison, in a NUMA system, memory on the same processor board, or in the same building block, as the processor is accessed faster than memory on other processor boards, or in other building blocks. That is, local memory is accessed faster than distant shared memory. NUMA systems generally scale better to higher numbers of processors than SMP systems. The term building block is used herein in a general manner, and encompasses a separable grouping of processor(s), other hardware, such as memory, and software that can communicate with other building blocks.
A difficulty with nearly any type of multi-processor computer system that includes caches for memory shared among the building blocks is that, for optimal and/or proper performance of the system, cache and/or memory coherence should be maintained. Coherence means that when a processor of any of the building blocks processes a memory line, the correct value in that memory line is maintained. This is difficult to ensure, because many copies of the memory line may exist within the system. There is the home memory line, stored in the local shared memory of one of the building blocks. There may also be one or more copies of the memory line stored in different caches of the building blocks of the system. Because the processors of a multi-processor system may use the same memory lines at almost the same time, it is important to let them know as soon as possible about changes to the memory lines that they are caching or that they are responsible for in their local shared memories. Otherwise, xe2x80x9cstalexe2x80x9d data may result, where a processor reads the incorrect value for a memory line.
For example, a memory line may be cached in a number of different caches of the multi-processor system. If one of the processors changes the data stored in the memory line in its cache, then the other processors should invalidate the memory line in their caches, if it is in fact stored there. That is, the other processors should ensure that they do not use the data stored in the memory line in their caches. However, to ensure that the process is completed properly, and that all the other processors have invalidated the memory line in their caches, these processors should send acknowledgments of the invalidation request for collection by an entity. Within the prior art, though, there is no manner by which to dynamically assign which processor should be responsible for collecting such acknowledgments. For this described reason, as well as other reasons, therefore, there is a need for the present invention.
The invention relates to assigning a building block collector agent to receive acknowledgments from other building block agents. In a method of the invention, a memory-line request is received from a requestor agent that is one of a number of agents. Each agent has a shared memory to share among the agents, as well as a cache to temporarily store a limited number of memory lines of the shared memories of the other agents. The method dynamically assigns a collector agent, which is also one of the agents, for receiving acknowledgments from the agents. This dynamic assignment is based on the type of memory line-related request, and/or the global state of the caches of the agents.
A system of the invention includes building blocks and an interconnect that interconnects the building blocks. Each building block has an agent, a shared memory, and a cache. The shared memories are shared among the agents of the building blocks. The cache of an agent temporarily stores a limited number of memory lines of the shared memories of the other agents. The interconnect includes a manager to dynamically assign a collector agent among the agents for receiving acknowledgments from the agents. This dynamic assignment is based on the type of a memory line-related request, and/or the global state of the caches of the building blocks.