In the following description, “computer system” is understood as being any system capable of executing the instructions of a computer program, and for this purpose having at least one processor including at least one central processing unit, said central processing unit being further called calculation core of the processor or CPU. It will be noted that a processor may have only one calculation core, or a plurality of calculation cores. In an increasing order of complexity, a computer system according to the invention can be composed of a microprocessor with one or more calculation cores, a single microcomputer having one or more microprocessors, or a more complex arrangement in which a plurality of microcomputers are interconnected via a data transmission network. The complexity of a computer system in which the invention can be implemented depends primarily on the application to be carried out.
“Computer process,” more generally referred to as “process,” is understood as a set of instructions to be executed by a central processing unit (i.e. a calculation core) of a processor in a specifically allocated memory space, with the possible aid of other resources. Because a computer program itself is composed of a structured set of instructions, a computer process can then be considered as an instance of a computer program to be executed or as a portion of said computer program. Since the same program can be executed multiple times (in parallel, successively, on the same processor or on different processors), it can therefore generate a plurality of computer processes.
A computer process is not necessarily composed of a single linear sequence of instructions, but can call several of them asynchronously. “Thread” is then understood as such a linear sequence of instructions participating in the execution of a computer process. From the point of view of the execution of instructions, a computer process can therefore always be considered as being a thread or a set of threads.
“Central processing unit having interleaved execution of a plurality of threads throughout a plurality of virtual processors from said same central processing unit” is understood as being a central processing unit having a mechanism for increasing the parallel execution of threads by sharing some of its internal resources, particularly its execution pipeline, its registers and its cache memories. Such a central processing unit therefore has as many virtual processors as threads that can simultaneously share its resources. From the user's point of view, everything takes place as though, instead of only one central processing unit, there were several, more specifically as many as its number of virtual processors.
This mechanism is generally referred to as hyper-threading or simultaneous multi-threading. One objective of central processing units implementing this mechanism is to take advantage of periods of inactivity created by waiting for data from the shared memory space of a thread. Specifically, when a thread is in a waiting situation, the central processing unit that executes it goes automatically to executing another thread, thus giving the impression of having several different virtual processors.
The Linux (registered trademark) operating system currently supports such a mechanism and therefore presents the user, when it is executed on a microprocessor having a hyper-threading mechanism, with the N virtual processors from this microprocessor when this mechanism is activated. It also advantageously implements a load-balancing algorithm during the execution of threads which must therefore take into account the specificities of the virtual processors: in particular, a virtual processor should not be unloaded by switching the threads to another virtual processor of a same central processing unit. This limitation particularly burdens the load-balancing algorithm.
Specifically, this limitation is particularly problematic in applications of real-time calculation or embedded processing, for example for microprocessors involved in mobile telephone applications.
According to a second specific example, in a supercomputer environment of the HPC type (High Performance Computing) involving a plurality of processing nodes organized in clusters of servers, the user wishes to have even finer control of the placement of his applications on the central processing units, for example in order to take advantage of shared caches that favor one communication mechanism or another.
There are thus possibilities of fine placement of the threads via system calls of application programming interface for managing these threads, allowing a user (i.e. the programmer) to specify the behavior of a task scheduler of the operating system with respect to a thread; but for the application concerned, this involves having the knowledge of the topology of the computer system and carrying out a placement that can sometimes conflict with other software layers.
Indeed, taking into account the specificities of the virtual processors appears to be complex, and becomes truly problematic and even automatically unmanageable in an HPC supercomputer application.
It is therefore desirable to provide a system for managing the interleaved execution of threads that makes it possible to overcome at least partially the above-mentioned problems and limitations.