With the development of information technologies, people generate, store, and process more and more data. A file quantity grows significantly with the increase of a data volume, which brings severe challenges to a metadata storage system of a cluster file system based on a conventional dynamic random access memory (Dynamic Random Access Memory, DRAM)+hard disk drive (Hard Disk Drive, HDD) storage architecture.
In the aspect of storage scale, total storage volume of the metadata is continuously growing. The quantity of files that need to be stored in the cluster file system is increasing. Particularly, with the rapid development of Internet applications, files exist in various forms, such as mail, photo, video, and report. With the increase of the total data volume, the quantity of files is growing nearly two times each year, which makes the total metadata volume of the cluster file system expand rapidly. In the aspect of the metadata operation performance, requirements for the performance are also improved gradually. High-performance computation gradually transforms from being CPU-intensive to I/O-intensive. I/O efficiency of the system has an important effect on the performance, which requires the storage system to have very high metadata operation performance. Further, the rapid development of the Internet also leads to higher requirements for the metadata operation performance of a mass storage system.
These challenges make the metadata storage system based on the conventional DRAM+HDD storage architecture fail to meet the requirements. The reason is that, the large total metadata volume makes the DRAM fail to meet all metadata requests, and some metadata I/O requests will be sent to the HDD, while the extremely high I/O delay of the HDD becomes a performance bottleneck of the system.
Compared with the HDD, a solid-state drive SSD (Solid-state Drive) has advantages of higher bandwidth and low delay, but there are also many problems in storing metadata by using the SSD. In the metadata storage organization of the cluster file system based on the conventional DRAM+HDD storage architecture, storage based on a directory tree is adopted or the metadata is stored in a database, which causes that the metadata I/O mode is mainly small granularity random I/O, and such an I/O is not suitable for the SSD. The reason is that, performance of the SSD is affected by the I/O mode, and sequential I/O performance of the SSD is better than random I/O performance. Further, small granularity random write may reduce a service life of the SSD, and may cause fragmentization of the SSD, which has a negative effect on subsequent I/O operations. In addition, the SSD is expensive, and a single has a small storage capacity, both of which may affect usage of the SSD. For the metadata storage organization structures based on novel memory media like an NVRAM (Non-Volatile Random Access Memory, non-volatile random access memory) and a PRAM (Phase-change RAM, phase-change RAM), if the metadata is stored separated, stored after compression, or stored jointly with small files, because the upper layer performs addressing based on bytes, the access mode is still mainly the small granularity random I/O, which is not suitable for the SSD, either.
In an existing SSD storage system designed for a specific load, specific optimization is performed based on the I/O feature of the SSD. For example, a write buffer is used to convert small granularity random write into large granularity sequential write, which gives full play to the performance of the SSD and ensures its service life. However, such a system depends on load characteristics in design and implementation, and is simplified according to the characteristics. Because the metadata storage system of the cluster file system has its own performance requirements and I/O load characteristics, the system cannot be directly applied to the metadata storage, either.
In the existing storage system that adopts DRAM+SSD+HDD three-tier storage, there are generally three design policies: the SSD acts as the buffer of the HDD, the HDD acts as the write buffer of the SSD, and the data is placed into the SSD and the HDD. For the first and third policies, because the SSD may be fast worn by receiving a large number of small granularity random I/Os, the service life of the SSD cannot be ensured. In the second policy in which the HDD acts as the write buffer of the SSD, there are two main problems for the metadata storage application of the cluster file system: First, the data is finally placed into the SSD, and a large number of SSDs are required when the data scale is very large, which increases the system cost; and second, if the HDD acts as the write buffer, some metadata read requests may be sent to the HDD, which significantly increases the metadata read request delay. The read operation is a synchronous operation, and the extremely high read request delay may affect the system performance. Therefore, the second policy cannot meet the requirements of the metadata storage system of the cluster file system, either.