The present invention relates, in the domain of computers, to multiprocessor systems formed by the union of modules (also called nodes) each having several processors. These systems are commonly called SMP (Symmetric Multi-Processing, or Symmetric multiprocessor). The invention more particularly relates to a multiprocessor computer system with several nodes, featuring a structure divided into modules enabling the number of the processors in the system to be increased by increasing the number of modules.
Computer systems of low processing power comprise a single processor with which a memory, input/output devices and mass storage systems (hard disks, optical storage, magnetic tape disks and similar) are associated. When the processing power required is greater, beyond what it is possible to obtain using a monoprocessor computer system, multiple processors must be coupled together by one or more buses.
In a manner known, SMP type systems feature a structure partitioned into modules or nodes. The SMP (symmetric multiprocessing) technology is a method used in the multiprocessor network servers. In the case of an SMP server, the memories (and all the internal peripheral devices) are shared between all the processors that use them jointly. A computer architecture of the SMP type is an architecture that consists in multiplying the processors within a computer, in such a manner as to increase the computing power. The increase in the number of processors enables a greater number of processes of the user system or kernel to be executed simultaneously, by allocating one or other of the available processors. FIG. 6 illustrates an example of conventional SMP architecture. The publication document “STARFIRE: Extending the SMP Envelope” IEEE Micro, January-February 1998, Vol. 18 1st edition, pages 39-49, illustrates a type of SMP architecture with a crossbar switching. It is also known, for example by “The Stanford FLASH Multiprocessor” (21st ISCA Proceedings) or by U.S. Pat. No. 7,047,372, a type of SMP architecture using directories referencing the memory addresses used by the different processors. In the existing range of NovaScale® servers marketed by the BULL Company, servers are provided with an SMP architecture that uses this type of directory.
The SMP type systems with several nodes require a cache consistency protocol to monitor, over time, the hosting locations of the memory addresses used by the different processors. This protocol is necessary in this type of system that uses a cache associated with each of the processors. As several processors can share a variable, it is possible to have several copies in several caches of the value of the variable that is shared in the memory. If one of the copies is modified by one of the processors, updates must be carried out in all the other caches where there is a copy of this variable if consistency is to be maintained. In SMP servers such as the NovaScale® 5005 servers of BULL Company, several processors forming the respective nodes are associated and the consistency of the data processed in the system is provided notably using an electronic chip typically grouping different identifiable processing agents of a cache consistency protocol. In an SMP system, a node can be defined as a topological group of agents/processors. From a functional viewpoint, the exchanges between agents from different nodes necessarily pass via an interconnection controller called a node controller NC. Physically, the different agents can be grouped on a same chip and therefore share the same links to communicate with the rest of the system.
The consistency protocols advantageously use directories to keep track of the shared information. In each node of such an SMP system known in the prior art as discussed herein with reference to FIG. 5, a memory controller (15) comprises a directory (150) managing the memory addresses within the node, and a node controller (20) that comprises a first directory (16) managing remote memory addresses that have been imported into the node (imported directory) and a second directory (17) managing the local addresses that have been exported to other nodes (exported directory). The shared information relating to a determined memory block (memory address) generally comprises a cache status of the block and the identity of the other nodes that share this block. Typically, the directories are distributed among all the nodes.
Another example of a prior art system is described in U.S. Pat. No. 7,017,011 which is assigned to the same assignee as named herein. This patent discloses a coherence controller adapted for connection to a plurality of processors equipped with a cache memory and with at least one local main memory. The coherence controller including a cache filter directory comprising a first filter directory SF designed to guarantee coherence between the local main memory and the cache memories of the local module. The cache filter directory includes a complementary filter directory ED which is handled like the cache filter directory SF for keeping track of the coordinates, particularly the addresses, of the lines or blocks of the local main memory copied from the local module into an external module and guarantees coherence between the local main memory and the cache memories of the local module and the external modules. Thus, the ED directory makes it possible to know if there are existing copies of the memory of the local module outside the module, and to propagate requests of local origin to the other modules or external modules only judiciously.
The cache consistency protocol is now well known and will not be described herein. However, in order to explain the problem of the systems of the prior art that the present invention proposes to resolve, it is necessary to explain the operation of this consistency protocol within the multiprocessor systems known by the prior art. The U.S. Pat. No. 7,130,969 is cited herein as an example of a multiprocessor system featuring directories for the cache consistency. The MESI or MESIF (Modified, Excluded, Shared, Invalid, Forward) protocol of the INTEL Corporation is a non-restrictive example of cache consistency protocol (reference can notably be made to the document U.S. Pat. No. 6,922,756 for the MESIF protocol).
The prior SMP type systems implemented the directories in two ways: “full directory” and “sparse directory” systems. Full directory systems store the shared information as close as possible to each block of the main memory; these systems waste a significant amount of physical memory as a directory input is required for all the blocks of the main memory even if the block has no associated cache in the system.
Sparse directory systems are preferred as they store in memory blocks only the shared information that is stored in cache at the level of remote processors. Hence, the quantity of memory used to maintain the consistency of the shared information is directly proportional to the number of memory blocks that can be stored in the cache memory of a basic processor.
The directories correspond to tables specifying, for each of the cache blocks of the main memory, one or more processors for which a copy is stored in cache memory. Directories typically provided for each of the nodes are stored in the integrated memory (in the cache) of a node controller. Separate memories of the RAM type, for example SRAM (Static Random Access Memory (RAM)) or DRAM (Dynamic RAM) are used for the storage of the sparse directory systems. These separate memories are interfaced with the directory controller of the node controller NC.
The directories can therefore be used by the node controllers to send messages called “snoops” that are used to consult the system memories susceptible of having a copy in order to determine the status of the data of the cache of the processors. The directories enable a filtering of the messages to be implemented to address only the relevant processors. It is understood that this construction enables data traffic to be reduced significantly.
As shown in FIG. 5, the SMP servers known in the prior art typically comprise several nodes (10), each comprising a node controller (20) and connected to each other by an interconnection network (2) between the node controllers (20), for example a communication line of a connector or similar means of communication. The node controllers (20) are coupled for example to an input/output circuit (14) and to several processors (12). Each processor (12) is associated with at least one memory (13). In this type of server known to the prior art, each node controller (20) is equipped with an imported directory (16) and an exported directory (17), as shown in FIGS. 5 and 7. The node controller (20) of each node (10) cooperates with a memory controller (15) managing the memory addresses within the node, using a directory (150) referencing the memory addresses used by the processors (12) within the node (10), as shown in FIG. 5. Within the framework of the cache consistency protocol of an SMP machine, the problem must be faced of a large number of agents to identify within a limited naming space. As a reminder, an agent is an entity that participates in the cache consistency protocol by sending and receiving packets and by applying the appropriate protocol processing to them. There are different types of agents and each agent generally has an identifier that must enable it to be identified uniquely. However, two agents associated with a same processor can share a same identifier if it is possible to systematically differentiate them (different type of agent for example).
The cache consistency protocol, notably the CSI protocol (Common System Interface), enables the use of two packet formats:
the standard header packets, and
the extended header packets.
The advantage of the use of standard header packets is their reduced size. However, their use has the disadvantage of proposing a naming space limited to a certain number of identifiers used to identify the processors, the I/O hubs, the node controllers and the memory controllers. Within the framework of the design of large servers of the SMP type, where the number of agents to identify is large, this limitation requires the system to be divided into nodes each having their own CSI naming space. At the interface of these different nodes is placed a node controller used as a proxy (“mandatory”) to the other nodes. With reference to FIGS. 5 and 7, the “controller” (20) thus takes on the name translation role from one naming space to the other.
In order to also face the issue of inflation, in a large SMP server, of the traffic of snoop messages (messages of the cache consistency protocol used to consult the caches of the system susceptible of having a copy of the memory address referenced by an agent in order to determine the status of the data in the cache), a known solution is to mask from the agents within a node (10) the visibility of the agents of the other nodes. This type of solution is appropriate for the dual issue of the high number of agents and the high traffic of snoop messages. Hence, the snoop traffic is prevented from increasing proportionally to the number of processors in the system and the response time to the snoops is prevented from increasing proportionally to the maximum distance between two processors of the system. It must be noted here that this distance can become great in a large SMP server due to the limited connectivity of the processors (12) and possibly the connectivity of the node controllers (20).
This masking is concretely performed in the node controller (20) which is present within the node (10) as a single agent performing accesses to the local addresses (in the name of the processors and input/output hubs external to the node) and as a single memory controller containing all the remote addresses (i.e. the addresses corresponding to the memories external to the node (10) with which it is associated). It is understood here that the adjectives “local” or “remote”, with regard to an address, are used according to membership or non-membership of the node (10) considered. In other words, an address is local to a node A if it is hosted in a random access memory module associated with an agent belonging to the node A. Conversely, an address is remote with respect to a node A if it is hosted in a random access memory module associated with an agent not belonging to the node A.
The NC controller (20) thus receives the packets from within or from outside the node (10) as a recipient of the packet. Then it assigns a new identifier to these packets before they pass from within the node to outside or conversely. If all the identifiers of the target naming space have been used, it causes this packet to wait in an internal buffer memory.
With reference to FIG. 7, when an agent sends requests to the memories within the node (10), it is identified in a directory or table (150) of these memory controllers (15). Hence, these memory controllers (15) only have a rough view of the outside of the node (10). They only know that the data has been exported by an agent outside the node (10) without knowing which agent or in which external node it is found.
To implement the snoop filtering, the NC controller (20) of the node (10) implements two cache directories (17, 16), stored in the memory of the node controller (20). A first, called exported directory (17), references the local addresses exported into the processor caches (12) of other nodes and makes it possible to know which nodes exported these addresses. A second, called imported directory (16), references the remote addresses imported into the caches of the processors of the node (10) and makes it possible to know which agents imported these addresses.
To provide acceptable performances, these two memory structures are implemented in RAM memory, this RAM memory notably being able to be implemented using SRAM technology (Static Random Access Memory) in the chip. The tables (17, 16) are then dimensioned proportionally to the sizes of the processor caches. This type of memory is very fast and does not need refreshing. Nevertheless it is also very expensive and voluminous.
A problem that arises in such systems known by the prior art relates to the necessity of a large size of memory allocated to the import and export directories and therefore the extra cost that implementing these directories represents. Hence, when a system comprises a large number of processors, it is necessary the node controllers have a sufficient memory to store all the imported and exported addresses. Indeed, the size of the imported directory of a node must be equal to the sum of the size of all the caches (3) of the processors (12) of this node. Likewise, the size of the export directory of a node must be equal to the sum of the size of all the caches (3) of the processors (12) of all the other nodes of the system.
The system according to the invention aims precisely to avoid the disadvantage of the extra cost when these memory structures require a large size, for example for a large SMP server. It can even be quite simply impossible to implement the quantity of memory necessary (technological limits) by following the type of solution of FIGS. 5 and 7. Hence, the invention, by solving these problems of implementing the memories necessary for these directories, also aims to make it possible to create systems containing a larger number of processors than is allowed by the systems known by the prior art, such as those shown in FIGS. 5 and 7.
It can be recalled that the quantity of memory that it is possible to place in the “cache consistency controller” of an NC controller (120) is limited by:
the process used (etching fineness),
the chosen chip size,
the type of memory implemented (SRAM or DRAM).
Moreover, the choice of placing a part of the memory that the “controller” needs outside the chip generates a “significant” cost in terms of response time that makes this possibility unattractive. The lower performances obtained with a memory external to the chip would therefore limit the applications. Moreover, this type of solution would result in a noticeable increase of the cost of the system (cost of the external memory modules to add to the cost of the chip).
In this context, it is interesting to propose an alternative enabling the disadvantages of the prior art to be overcome. Indeed, a system comprising the 3 types of directory described herein for the systems known by the prior art have the disadvantage of requiring a considerable size of memory at the level of the node controller. In particular, the exported directory contains the memory addresses that were exported to other nodes. It is therefore understood that the more the system comprises a large number of nodes (and processors), the more this exported directory requires a large storage space.