Field of the Invention
The present invention concerns the management of standby in microprocessors and more particularly a method of optimizing management of standby of a microprocessor enabling the implementation of several logical cores, for example a microprocessor implementing a technology known under the name simultaneous multi-threading, in particular in the context of high performance computing, as well as a computer program implementing such a method.
Description of Related Technology
High Performance Computing (HPC) is being developed for university research and industry alike, in particular in technical fields such as aeronautics, energy, climatology and life sciences. Modeling and simulation make it possible in particular to reduce development costs and to accelerate the placing on the market of innovative products that are more reliable and consume less energy. For research workers, high performance computing has become an indispensable means of investigation.
This computing is generally conducted on data processing systems called clusters. A cluster typically comprises a set of interconnected nodes. Certain nodes are used to perform computing tasks (compute nodes), others for storing data (storage nodes) and one or more others manage the cluster (administration nodes). Each node is for example a server implementing an operating system such as Linux (Linux is a trademark). The connection between the nodes is, for example, made using Ethernet or Infiniband links (Ethernet and Infiniband are trademarks).
FIG. 1 is a diagrammatic illustration of an example of a topology 100 for a cluster, of fat-tree type. The latter comprises a set of nodes of general reference 105. The nodes belonging to the set 110 are compute nodes here whereas the nodes of the set 115 are service nodes (storage nodes and administration nodes). The compute nodes may be grouped together in sub-sets 120 called compute islands, the set 115 being called a service island.
The nodes are linked together by switches, for example hierarchically. In the example illustrated in FIG. 1, the nodes are connected to first level switches 125 which are themselves linked to second level switches 130 which in turn are linked to third level switches 135.
As illustrated in FIG. 2, each node generally comprises one or more microprocessors, local memories and a communication interface. More specifically, the node 200 here comprises a communication bus 202 to which there are connected
central processing units (CPUs) or microprocessors 204;
components of random access memory (RAM) 206, comprising registers adapted to record variables and parameters created and modified during the execution of programs (as illustrated, each random access memory component may be associated with a microprocessor); and,
communication interfaces 208 adapted to send and to receive data.
The node 200 furthermore possesses here internal storage means 212, such as hard disks, able in particular to contain the executable code of programs.
The communication bus allows communication and interoperability between the different elements included in the node 200 or connected to it. The microprocessors 204 control and direct the execution of the instructions of portions of software code of the program or programs. On powering up, the program or programs which are stored in a non-volatile memory, for example a hard disk, are transferred into the random access memory 206.
To improve the performance of each node, the microprocessors used are often multi-core microprocessors, that is to say microprocessors comprising several cores which can be used in parallel.
Furthermore, certain microprocessors comprise several logical cores, each physical core being adapted to implement several logical cores. This technology, called simultaneous multi-threading (or hyperthreading according to Intel's implementation, Intel being a trademark), enables several elementary processes, called threads, to be executed, practically in parallel, in a physical core of a microprocessor (the execution contexts are loaded at the same time and the threads share the executing kernel. A physical core implementing this technology is thus generally perceived as a dual-core by the logical layer utilizing the physical core.
A physical core implementing this technology comprises resources shared between the logical cores and resources specific to each logical core. The shared resources are typically execution units, cache memories and the bus interfaces. The specific resources are in particular the registers for data and instructions of the logical core, for segments and for control as well as the interrupt controller (called APIC, standing for Advanced Programmable Interrupt Controller).
However, whereas this technology makes it possible to significantly improve the performance of a microprocessor for particular applications, in particular image processing applications, it has been observed that the performance was only slightly improved, or even degraded, for other applications, in particular scientific computing applications. It is thus generally deactivated in the clusters used for high performance computing, which goes against the principle that the clusters are optimized to use their resources as well as possible.