In the field of information processing, it is possible to increase the power of a machine by increasing the number of processors that make it up. One type of machine, known as a symmetrical memory processor (SMP), allows different processors of the same machine to gain symmetrical access to the memory by means of a system bus. These machines have nonuniform memory access, to the extent that the access time to the memory is substantially the same for all the data accessed. However, the performance curve of such machines does not increase linearly as a function of the number of processors. A high number of processors means that the machine has more problems of accessibility to its resources available to it for executing applications. The consequence is that the performance curve is shifted considerably when the number of processors exceeds an optimal value, often estimated to be on the order of four. The state of the art has proposed various solutions to this problem.
One known solution consists of combining a plurality of machines into clusters to allow them to communicate with one another by means of a network. Each machine has an optimal number of processors, for instance four, and its own operating system. It establishes communication with another machine every time it performs processing on the data kept up to date by this other machine. The time required for these communications and the necessity of working with coherent data presents problems of latency for high-volume applications, such as distributed applications that require numerous communications. The latent period, or latency, is the length of time between the moment when a memory access request is sent and the moment when the response to this request is received.
Another known solution is that of non-uniform memory access (NUMA) machines. These are machines with nonuniform memory access (to use the English term), in the sense that the memory access time varies depending on the location of the data access. A machine of the NUMA type is made up of a plurality of modules, and each module includes an optimal number of processors and a physical portion of the total memory of the machine. Such a machine has nonuniform memory access because a module generally more easily gains access to a physical portion of the memory that it does not share with another module than to a physical portion it does share. Although each module has a private bus system connecting its processors and its physical memory, an operating system common to all the modules makes it possible to consider all the private bus systems as a single, unique bus system for the machine. Logical addressing assigns a residence site to a place in physical memory determined by a module. For a given processor, a distinction is made between access to a local memory portion, physically located in the same module as the processor, and accesses to a remote memory portion, physically located in one or more other modules than the one where the processor is located.
One particular type of NUMA machines is known as cache coherency non-uniform memory access (CCNUMA), that is, machines with cache coherency. A shared cache mechanism means that at a given moment a valid copy, that is, an updated copy, of this block is not necessarily located in its physical memory location of residence. One or more updated copies of the block can thus migrate from one module to another in accordance with applications requests and system calls. The physical memory, located in a given module, is the one to which the module in question can gain access fastest, because it can do so directly by means of its local system bus. The physical memory, at a distance in another module, is the one to which the module in question gains access the least rapidly, because it requires more or more transactions between modules. The physical memory that is local to the module in question includes a first portion especially assigned to the data blocks resident in this module, and a second portion especially assigned to the copies of blocks resident in other modules. The second physical memory portion constitutes a cache memory of the remote memory in the other modules.
A block that is resident in the first physical memory portion is not immediately available, if its contents are not an updated copy, which the case for example if one or more other modules are sharing this block, and if one of these other modules is holding an updated copy in terms of memory coherency. To manage the sharing of blocks residing in the first physical memory portion with other modules, the module in question has a local memory directory LMD (for the English term). The table or directory LMD is made up of a plurality of lines, each of which is intended to refer to one block residing in the module and shared with one or more other modules.
A block that does not reside in the first physical memory portion is immediately available if an updated copy of this block is accessible in the second physical memory portion. To manage the presence of updated copies in the second physical memory portion, the module in question has a remote memory table RCT (for remote cache table, in English). The table RCT is made up of a plurality of lines, each of which is intended to correspond with a place in the second physical memory portion, each place being intended to contain one block copy referenced by this line.
The importance of machines with nonuniform memory access and cache coherency is that each module works on data blocks that reside in a first portion of its local memory or on copies in a second portion of its local memory of blocks that reside in a first memory portion of another module. A given module then has no need to communicate with other modules in order to work on updated copies in such a way as to assure data coherence. In terms of execution, it is thus fundamentally of no significance whether a data block resides in one module or another, because each module, if necessary, relocates copies of blocks it needs to its local memory. However, to run the operating system common to all the modules or certain applications of the distributed type, it is possible that some data may often be useful to all the modules. By way of non-limiting example, these data have to do with process allocation tables, open file tables, or tables of set locks of shared resources. The coherence of these data has the risk of requiring numerous exchanges between modules and of thus interfering with the increase in performance expected from such machines. The problem is that it is difficult to evaluate a priori the extent to which the data shared by a plurality of modules threaten to impede machine performance, because this impedance can also depend on the way in which the machine is used while applications are being run on the machine. On the other hand, it would be useless to invest much expense for optimization for data are not likely to impede performance, with the risk that data whose location does threaten to impede performance more appreciably might be ignored.