1. Field of the Invention
The invention relates to a process for producing a machine with non-uniform memory access and cache coherency, and a machine adapted for implementing a process of this type, in the data processing field.
2. Description of Related Art
In the data processing field, it is possible to increase the power of a machine by increasing the number of processors of which it is composed. One type of machine known by the name symmetric multi-processor (SMP) allows various processors in the same machine to access its memory symmetrically by means of a system bus. These are machines with uniform memory access, in that the memory access time is substantially the same for all the data accessed. However, the performance curve of such machines does not increase in a linear way as a function of the number of processors. A high number of processors requires the machine to manage more problems of accessibility to its resources than it has resources available for running applications. The result of this is that the performance curve drops considerably when the number of processors exceeds an optimum value, often estimated to be on the order of four. The prior art offers various solutions to this problem.
One known solution consists of grouping a plurality of machines into clusters, in order to have them communicate with one another through a network. Each machine has an optimal number of processors, for example four, and its own operating system. It establishes a communication with another machine every time it performs an operation on data maintained by this other machine. The time required for these communications and the need to work on consistent data causes latency problems for high-volume applications such as, for example, distributed applications which require numerous communications. Latency is the duration that separates the instant at which a request for access to the memory is sent, and the instant at which a response to this request is received.
Another known solution is that of machines of the type non-uniform memory access (NUMA). These are machines with non-uniform memory access, in that the memory access time varies according to the location of the data accessed. A NUMA type machine is constituted by a plurality of modules, each module comprising an optimal number of processors and a physical part of the total memory of the machine. A machine of this type has non-uniform memory access because it is generally easier for a module to access a physical part of the memory that it does not share with another module than to access a part that it shares. Although each module has a private system bus linking its processors and its physical memory, an operating system common to all the modules makes it possible to consider all of the private system busses as a single, unique system bus of the machine. A logical addressing assigns a place of residence to a predetermined physical memory location of a module. For a specific processor, accesses to a local memory part physically located in the same module as the processor are distinguished from accesses to a remote memory part, physically located in one or more modules other than that in which the processor is located.
One particular type of NUMA machine is the cache coherency non-uniform memory access (CCNUMA) type, that is, the type of machine having cache coherency. A shared caching mechanism ensures that at a given instant, a valid, that is updated, copy of this block is not necessarily located in its physical memory location of residence. Thus, one or more updated copies of the block can migrate from one module to another in response to application requests and system calls. The physical memory located within a specific module is the one that the module in question accesses the fastest, since it does so directly through its local system bus. The remote physical memory in another module is the one that the module in question accesses the slowest, since it requires one or more transactions between modules. The physical memory local to the module in question comprises a first part specifically assigned to the data blocks resident in this module, and a second part specifically assigned to copies of blocks resident in other modules. The second part of the physical memory constitutes a cache for a remote memory in the other modules.
A block resident in the first physical memory part is not immediately available if its contents do not constitute an updated copy, which is the case, for example, if one or more other modules share this block and if one of these other modules maintains a copy of it that is up-to-date in terms of memory coherency. In order to manage the sharing of blocks resident in the its first physical memory part with other modules, the module in question has a local memory table LMD (Local Memory Directory). The table LMD is constituted by several lines, each of which is intended to reference a block resident in the module that is shared with one or more other modules. The more lines the table LMD contains, the more resident blocks can be shared by the other modules at a given instant. This is advantageous for the other modules, but can be less so for the module in question, since the location of the updated copies of its resident blocks risks being more scattered, and consequently requiring longer access times. On the other hand, it is preferable to locate the table LMD in a fast access memory, since it is involved in access to the first physical memory part. The cost of using fast access memories, for example static memories, makes it prohibitive to reference, in the table LMD, all of the blocks resident in the first physical memory part.
A block that is not resident in the first physical memory part is immediately available if an updated copy of this block is accessible in the second physical memory part. In order to manage the presence of updated copies in the second physical memory part, the module in question has a remote cache table RCT. The table RCT is constituted by a plurality of lines, each of which is intended to correspond to a location in the second physical memory part, each location being intended to contain a copy of a block referenced by this line. The table RCT therefore contains as many lines as there are locations that can be contained in the second physical memory part. It is understood that the greater the size of the second physical memory part, the more copies of blocks resident in other modules the latter can contain. However, a second physical memory part with a size capable of containing copies of all the blocks resident in other modules at one time would be disproportionately large. On the other hand, it is preferable to locate the table RCT in a fast access memory, since it is involved in the access to the second physical memory part. The cost of using fast access memories, for example static memories, makes it prohibitive to reference, in the table RCT, all of the blocks resident in the other modules.
In producing a machine constituted by a plurality of modules, each of which comprises a table LMD and a table RCT, the constraints described above illustrate the necessity for the size of these tables to be compatible with the desired performance of the machine. The problem is that this compatible size is difficult to obtain a priori.