As is well known in the data processing field, it is possible to increase the power of a machine by increasing the number of processors of which it is composed. "Symmetrical Multiprocessor" (SMP) allows different processors in the same machine to access its memory symmetrically by means of a system bus. These are machines with uniform access memory, inasmuch as the access time to the memory is substantially the same for all of the data accessed.
For this reason, the architecture is called "UMA" (a "Uniform Memory Access").
FIG. 1 attached to the present specification schematically illustrates an example of the "UMA" type architecture.
The data processing system 1, which hereinafter will be called the "SMP" module, comprises a certain number of central processing units or processors, or "CPU". Four central processing units (CPUS) are represented in the example of FIG. 1: 10 through 13. Associated with these central processing units 10 through 13 is a main memory 14 accessible by all of them via an access line L.
Since all the accesses take place within the module 1, that is, locally, and if the total available memory space is homogeneous in terms of access time, (which constitutes the initial hypothesis, since this is a "UMA" architecture), the access time remains substantially the same, no matter which central processor 10 through 13 has sent a request.
Although only four central processors 10 through 13 have been represented in FIG. 1, it should be clear that this number is completely arbitrary. It can be increased or decreased. However, the performance curve of machines of this type does not increase in linear fashion as a function of the number of processors. An increased number of processors causes the system to consume more time for problems of accessibility to its resources that it has available for running applications. The consequence of this is to considerably lower the performance curve when the number of processors exceeds an optimum value, often estimated at about 4. The prior art proposes various solutions to this problem.
One known solution consists of grouping a plurality of machines into clusters so as to have them communicate with one another through a network. Each machine has an optimal number of processors, for example four, and its own operating system. It establishes a communication with another machine every time it performs an operation on data maintained by this other machine. The time required for these communications and the need to work on consistent data causes latency problems for high-volume applications such as, for example, distributed applications which require numerous communications. Latency is the time that separates the instant at which a request for access to the memory is sent, and the instant at which a response to this request is received.
Another known solution is that of machines with a "Non-uniform Memory Access" (NUMA) architecture. These are machines with non-uniform access memory, inasmuch as the access time to the memory varies depending on the location of the data accessed. A "NUMA" type machine is constituted by a plurality of modules, each module comprising an optimal number of processors and a physical part of the total memory of the machine. A machine of this type has non-uniform memory access because a module generally has easier and faster access to a physical part of the memory that it does not share with another module than to a part that it shares. Although each module has a private system bus linking its processors with its physical memory, an operating system common to all the modules allows all of the private system busses to be considered as a single, unique system bus of the machine. A logical address assigns a place of residence to a given physical memory location of a module. For a specific processor, accesses to a local memory part physically located in the same module as the processor are distinguished from accesses to a remote memory part, physically located in one or more modules other than the one in which the processor is located.
FIG. 2 attached to the present description schematically illustrates an example of this type of architecture, that is, a "NUMA" architecture. To simplify the drawing, it has been assumed that the data processing system 1' comprises only two modules, Ma and Mb, of the above-mentioned "SMP" type, and that the two modules are identical. It must be understood, however, that the data processing system 1' can comprise a greater number of modules and that the modules Ma and Mb can be different (particularly in terms of the number of central processors).
The module Ma comprises four central processors 10a through 13a, and a main memory 14a. Likewise, the module Mb comprises four central processors 10b through 13b, and a main memory 14b. The two memories 14a and 14b (and more generally the n main memories) communicate with one another by means of what is called a "link" 2, generally via so-called remote cache memories 15a and 15b, respectively. The link 2 does not correspond to simple physical links, but comprises various standard electronic circuits (control circuits, interface circuits, etc.) that do not need to be described any further because they are well known in the prior art.
It is easy to understand that, in an architecture of this type, if an application is running in the module Ma, for example, the access time to the "near" memory 14a (local access) is, a priori, less than the access time to the "far" memory 14b located in the module Mb, no matter which central processor 10a through 13a is involved. It is specifically necessary to pass through the link 2 when the data are physically stored in another module, which substantially increases the transfer time.
In modern data processing systems, the allocation of the memory for a given application is carried out on the basis of a virtual memory space. This allocation is placed under the control of the operating system or "OS." A dynamic correspondence is then established between the virtual memory space and the physical memory. For this purpose, it is customary to use address correspondence tables called dynamic "mapping". Various types of memory configurations have been proposed, including organization by regions or by segments. To explain the concepts without in any way limiting the scope of the invention, the case of a "segment" type configuration will be described below. In practice, a segment is defined as a space of contiguous virtual addresses, of fixed and predetermined length.
More precisely, in the prior art, the above-mentioned dynamic correspondence or "mapping" is carried out in accordance with rules common to all the applications, no matter what their types, without taking into account the location of the physical memory. In practice, if a process intends to access a virtual address and no entry in the address correspondence table is found, an exception is generated, which is formalized by the detection of a page fault, according to the "UNIX" (registered trademark) terminology. The term "page" can be defined more generally as being a "range of contiguous addresses." A page constitutes a subdivision of a segment. However, for purposes of simplification, the term "page" will be used below. After a detection of a page fault, a device called a handler allocates physical memory in accordance with the above-mentioned common rules. This simple allocation method is entirely suitable for the standard "SMP" machines of the above-mentioned "UMA" type, since the average access time to the memory is uniform.
On the other hand, with an architecture of the "NUMA" type as described in regard to FIG. 2, for which the access time is no longer uniform, the need arises for a memory allocation process that minimizes the negative impact on the performance of the system.
In the prior art, processes for this purpose have been proposed. For example, it has been proposed that the memory allocation rules be modified in order to obtain an optimization, but once modified, the rules remain identical for all the applications. Moreover, this process has additional drawbacks. The modified rules may be advantageous for one specific application, but inappropriate or even dangerous for another.
It has also been proposed that specific "Application Program Interface", adapted so as to define a specific algorithm associated with a given application, be used to establish the correspondence ("mapping") between the virtual memory space and the physical memory, but in this case, it is necessary to modify both the corresponding applications and the resident part of the operating system ("kernel"). Therefore, this method cannot be applied on its own to the existing programs. In any case, it lacks flexibility and its effectiveness is limited.