1. Field of the Invention
The present invention relates to the field of multiprocessor machines with symmetrical architectures comprising memories with uniform access and cache coherency, better known by the term “CC-UMA SMP” (Cache Coherent-Uniform Memory Access Symmetrical Multiprocessor).
More specifically, it concerns an interconnection architecture between a large number of processors organized into clusters of multiprocessor modules, also called nodes of a computer system.
2. Description of Related Art
In such machines, one or more levels of cache memory of each processor store data that has been recently acquired and can be reused quickly, thus preventing subsequent contentions for memory access.
When a processor does not find the desired data in its own cache memory, it transmits the address of the storage area in which it wishes to read the data via the system bus.
All the processors check their cache memories to see if they have the most recently updated copy of the data at this address.
If another processor has modified the data, it informs the sending processor that it can retrieve the data contained in its cache memory instead of requesting it from physical storage.
In order for a processor to be able to modify a storage area, it must obtain authorization to do so.
After such a modification, all the other processors that have a copy of the data that has just been modified must invalidate this copy in their respective cache memories.
A trace of the transactions passing through the system bus and a cache coherency control protocol will keep a record of all the transactions exchanged between the processors.
Conventionally, four fundamental states of the coherency protocol of the memories are distinguished, currently standardized under the code MESI, the abbreviation for “Modified state, Exclusive state, Shared state, Invalid state.”
Generally, managing exchanges between the memories of a computer system consists of updating or invalidating copies of a storage block.
If invalidation is chosen, it is known, for example, to add to the address of each block two bits representing the state of the block in the memory.
According to this protocol, and in each node, the state “M” corresponds to the case in which a given cache memory is the only holder of a modified copy of a block, called the reference copy, and in which there are no other valid copies in the system. In this case, the cache memory holding the copy having the state “M” is responsible for supplying and updating the internal memory if it wants to dump this copy. A copy having the exclusive state “E” is also a reference copy held by a cache memory, but in this case, the internal memory has an updated copy of it. The shared state “S” relates to a copy that may have several holders, but only one copy of it is up-to-date in the internal memory. The potential holders depend on the management strategy adopted. Finally, there is the invalid copy state “I” when the cache memory has a copy of the block that is not valid. When a processor wants to access a block in the “I” state, its cache memory sends a request through the system bus to acquire the block. The other cache memories and the internal memory connected to the system bus learn of this request and react as a function of the state of the copy that the corresponding memory has. Thus, if a cache memory has the block in the modified state “M,” it supplies the block to the system bus, thus allowing the processor to request access to the data and its cache memory to acquire the reference copy. There are still other optional states, which may or may not be standardized.
Management of this type is adopted particularly when several processors frequently modify the data sent via the system bus.
Since the storage areas are all accessible in the same way by all the processors, the problem of optimizing the location of the data in a given storage area no longer exists.
That is what characterizes uniform memory access UMA.
This type of management is reliable as long as the system bus is not saturated by too many processors sending too many memory access requests at the same time.
With the increase in the level of component integration, it is possible to store more and more processors on the same card, and processor speed increases by 55% per year.
An article by Alan Charlesworth (Sun Microsystems) published in IEEE Micro, January/February 1998, pages 39–49, describes an architecture for interconnection between several nodes, adapted to such constraints.
This interconnection architecture uses a packet switching protocol that makes it possible to separate the requests and the responses, allowing transactions between several processors to overlap in the bus.
Furthermore, it uses an interleaving of several control busses. Thus, the utilization of four address busses makes it possible to check four addresses in parallel. The physical storage space is divided by four and each address bus checks a quarter of the memory.
With such an architecture, it is possible to connect between 24 to 64 processors using 4 address busses and an automatic switching device, also known as a “crossbar,” between each node, each node comprising a minimum of four processors.
The interconnection of the modules uses the memory coherency protocol defined above and makes it possible to perform transactions between modules between two levels of interconnection:                a first level of interconnection, at the node level, conveying the internal traffic of the node, issuing from the processors and the memories, to the output addresses of the node and of the data ports; and        a second level of interconnection, at a higher level, transferring the addresses and the data between the various nodes of the system.        
The memory access requests (commands and addresses) are broadcast to all of the nodes in order to consult their directories to locate the data that a processor of the system needs to access.
The memory access time is independent of the node in which the physical memory is located.
Two other types of interconnection may be distinguished, according to the type of information processed; interconnection in the address space and interconnection in the data space.
One of the disadvantages of such an architecture resides, in particular, in the utilization of a “crossbar,” which includes a large number of connection points and therefore takes up a lot of space, centralized on the backpanel to which the nodes constituting the machine are connected.
This architecture also requires an “active” backpanel, i.e., one that comprises logical components, as opposed to a “passive” backpanel without any logical components.
In this type of machine, it is necessary to duplicate the “crossbar” in order to obtain good availability, and in the event of a failure, part of the usable bandwidth is lost.
A machine of this type is therefore not conducive to the adaptation of new multiprocessor modules to different machine configurations and thus does not allow for the desired modularity.