A supercomputer is a computer designed to attain the highest possible performances with the techniques known at the time of its design, in particular as regards the rapidity of data processing (computing speed). It is also called a high performance computer.
Nowadays, the volume of data processed by computer applications running on this type of computer commonly reaches several hundreds of gigaoctets (Go). It may go up to several teraoctets (To) for certain scientific computing applications (for example the analysis of earthquakes, meteorology, molecular modelling), or for stochastic computation in the field of finance or insurance. It goes without saying that such data volumes cannot be stored permanently in the central memory of the computer, or random access memory (RAM), which is very expensive. In addition, the random access memory is a volatile memory whereas a storage that is persistent in the long term, making it possible to save the data processed or to be processed for future use, is also necessary.
That is why the data are stored in secondary, non-volatile memories in the form of files, that is to say structured sequences of data blocks. The secondary memories are mass storage components, for example hard disc drives (HDD). A data block is the smallest data unit that the storage system is capable of managing. The content of these blocks, simple sequences of binary data, may be interpreted according to the file format as characters, whole or floating-point numbers, machine operating codes, memory addresses, etc.
The secondary memories form a mass storage system (or storage) for the persistent storage of data in the supercomputer. The exchange of data between the central memory and this storage system takes place by transfer of blocks.
The function of a file system (FS) is to ensure access to the contents of the files stored in the storage system in the best conditions of rapidity and reliability. The access to a file located at a given location of the storage system comprises the opening of the file and the carrying out of operations of reading or writing of data in the file. These operations enable, for example, the recording of the file, its copying or its displacement to another location, or its deletion. They are controlled by corresponding instructions of a computer programme when it is executed by one or more processors of the supercomputer. These instructions specify an access path to the file. Such an access path is formed, for example, of a name preceded by a list of interwoven directories (under Windows), or a chained list of files (under UNIX).
The design of a supercomputer should ensure that large volumes of data can be read, transferred and stored rapidly, that is to say with relatively high rates of access to the storage components, of the order of several tens or even several hundreds of gigaoctets per second (Go/s). Yet, the cost of a supercomputer is greatly dependent on the volumetry and the rate offered by the storage system.
It is known that the carrying out of works using the data stored on the storage components of the storage system is sometimes sensitive to the speed of accessing the data, but sometimes much less. This signifies that certain works are relatively little impacted in their run time (which determines the efficiency of the supercomputer), by the access rate offered by the storage system.
For example, the storage of data representing the application (programme and libraries), compilation data, syntheses of results, or certain visualisation results may be satisfied by a relatively slow storage system. On the contrary, intermediate computing data, and output or visualisation data are more sensitive to the efficiency of the storage system.
In order to take account of the aforementioned considerations, HPC type environments are often composed of two or more independent storage sub-systems. On account of the respective technologies of the storage components respectively associated with these storage sub-systems, one of the storage sub-systems is relatively rapid and more or less capacitive, and the other is relatively slow and very capacitive. Put another way, the design of the supercomputer obeys the search for a compromise between the respective volumetries of a first storage system offering a good access rate but which is expensive, and a second storage system which is more capacitive but offering a less good access rate and which is cheaper.
Nevertheless, and even excluding the different access profiles to manage, at the level for example of the inputs/outputs (E/S), the cost associated with the putting in place of these two types of storage components for two unconnected storage systems is not optimal. In fact, the overall volumetry of such a composite storage system is increased compared to that of a standard storage system, that is to say unitary with respect to the technology of storage components and thus to the access performance level (notably in terms of rate and rapidity of access to data).
This solution implies copies of data between the two types of storage components, according to the following outline:
1. Preparation of data sets on the first level storage system (that which is relatively the slowest);
2. Copying data from the first level storage to the second level (that which is relatively the fastest);
3. Carrying out of works on the data present on the second level storage system; and, as an option,
4. Recopying data to the first level storage system.
Consequently, many data exchanges are necessary between the different storage sub-systems, which implies an additional cost also in time and in additional operations.