Technological Field
The present application relates to the configuration and maintenance of computer systems and more particularly a method, a computer readable medium and a device for the configuration or maintenance of a computer system in a cluster, making versatile use of an operating system that uses only the random access memory (RAM) of the computer system on which it operates.
Description of Related Technology
High performance computation, also known as HPC (abbreviation for High Performance Computing), is being developed both for research in universities and for industry, in particular in technical fields such as the automotive, aeronautical, power, climatology and life sciences fields. Modeling and simulation make it possible in particular to reduce the development costs and to bring innovative products that are more reliable and more energy-efficient to the market more quickly. For researchers, high performance computing has become an indispensable research tool.
These computations are generally implemented on data processing systems called clusters (sometimes referred to as “server clusters” or “server farms”). A cluster typically comprises a set of computer systems implemented in interconnected nodes. Certain nodes are typically used for performing computation tasks (computation nodes), others for data storage (storage nodes), and one or more others manage the cluster (administration nodes). Each node is for example a server running an operating system such as Linux (Linux is a trade mark). The connection between the nodes is achieved for example by utilizing Ethernet communication links and interconnecting networks (for example Infiniband) (Ethernet and Infiniband are trade marks).
FIG. 1 shows schematically an example of a topology 100 of a cluster, of the fat-tree type. The latter comprises a set of nodes generically referenced 105. The nodes belonging to the set 110 are computation nodes while the nodes of the set 115 are service nodes (storage nodes and administration nodes). The computation nodes can be grouped in subsets 120 called computation islands, the set 115 being called a service island.
The nodes are linked to each other by switches, for example hierarchically. In the example shown in. FIG. 1, the nodes are connected to first-level switches 125 which are themselves linked to second-level switches 130 which are in turn linked to third-level switches 135.
The purpose of the boot process of a computer system is to obtain an operating system that can be accessed via a permanent or removable storage peripheral, which then makes it possible to load and execute application programs. This operating system is obtained by means of a simpler program called a bootloader, executed using the Basic Input Output System (BIOS) generally contained in a read only memory of the motherboard of the computer system.
During this boot phase, all the essential software components that are necessary for the operation of the computer system are loaded into random access memory.
An example architecture for a computer system belonging to a cluster is shown in FIG. 2.
The device 200 contains a communication bus 202 allowing data exchange with elements external to the device 200 (input/output bus) and a communication bus 204 dedicated to data exchange with a memory.
A read only memory 206 (Read Only Memory (ROM), or Electrically-Erasable Programmable Read-Only Memory (EEPROM)) containing the BIOS program of the system as well as a random access memory (RAM) 208 comprising registers suitable for recording variables and parameters created and modified during the execution of programs as well as an operating system (typically comprising at least one node and one file system) are connected to the bus 204.
One or more microprocessors or central processing units (CPU) 210 as well as a communication interface 212 suitable for transmitting and receiving data over a network are connected to the buses 202 and 204. The communication interface 212 comprises an expansion ROM 214, which contains a program allowing the operating system to be booted over a communication network.
It is noted that there is a boot process known as Pre-boot eXecution Environment (PXE) or open source version of PXE (gPXE) capable of use for loading software components into the random access memory 208 from a remote storage device, via a communication network.
The communication buses 202 and 204 allow communication and interoperability between the different elements included in the device 200 or linked thereto. The representation of the buses is non-limitative and, in particular, the central processing units are capable of communicating instructions to any element of the device 200 directly or via another element of the device 200.
In order to be executed, the executable code stored, for example, in the ROM 206 or in the expansion ROM 214 is typically loaded into the RAM 208.
Thus, the central processing units 210 command and direct the execution of the instructions or portions of software code of the program(s) which are stored in the random access memory 208 from the expansion ROM 214, the ROM 206 or any other, local or remote storage element.
Some important tasks for the configuration and maintenance of computer systems, for example updating the BIOS and modifying Desktop Media Interface (DMI) fields, comprising information such as the serial numbers of machines, when a read only memory of the EEPROM type is used, require an operating environment of the Disk Operating System (DOS) type.
In fact, although there are other environments for these configuration and maintenance operations, these environments have drawbacks, in particular in terms of cost. While the cost of tools of the DOS type are typically included with the licence of the BIOS, paid for each machine, it is generally necessary to acquire specific licences for the configuration and maintenance tools used in other environments, for example in the Linux environment. Such licences can exceed ten dollars annually per machine. As a result, the costs for a cluster can reach several tens of thousands of dollars annually.
However, the size of the data and of the programs used for the execution of configuration and maintenance tasks for computer systems no longer permits the use of floppy disk type storage media, the storage capacity of which (typically of the order of 1.4 Mb) is too small. As a result, the execution of configuration and maintenance tasks for computer systems in a DOS-type environment typically requires a support of the Universal Serial Bus (USB) key type, a hard disk, a Compact Disc—Read Only Memory (CD-ROM) or Digital Versatile Disc—Read Only Memory (DVD-ROM). Although offering advantages, such support is difficult to maintain, for example for adding an update of a new BIOS, and requires manual intervention, machine by machine, which is not easily compatible with use in server farms or clusters.