The development of computer technologies and network technologies and the informatization of human life have resulted in requirements from users for increasingly large storage capacities of storage systems and increasingly high performance of the storage systems. A storage system also develops from a memory built in a computer to an independent storage system, such as a storage array, or a network attached storage (NAS), and then to a large-scale distributed file storage system. As the degree of digitization increases, a storage object also changes from an object that is mainly structured data to an object that is mainly unstructured file data such as pictures or micro videos. This raises higher requirements on the file data access performance of the storage system, and therefore it becomes the top priority in the current storage field to improve the access performance of a large-scale distributed file storage system.
A distributed file storage system includes multiple storage server nodes, where the multiple storage servers are interconnected by using a network (for example, an InfiniBand (IB) network, or a 10 Gigabit (G) Ethernet) with low latency and high throughput to form a cluster, and constitute a large-scale network redundant array of inexpensive disks (RAID); in addition, all the storage servers externally provide a data read-write service simultaneously. When file data is stored in the distributed file storage system, striping is performed on the file data by using an algorithm, such as a cross-node RAID algorithm (for example, RAID5, RAID6, or RAIDZ), or an erasure code algorithm, that is, the file data is divided into multiple data strips, corresponding parity strips are generated, and then the data strips and the parity strips are stored to a storage server of a corresponding node. When the stored file data is read, a certain quantity of data strips and parity strips are read from the storage server node to subsequently construct original file data that needs to be read by a user.
As the size of the cluster in the distributed file storage system increases, in order to improve space utilization of the entire distributed file storage system, an increasingly large quantity of data strips are obtained by division when striping is performed on the file data, and quantities of disk input output (IO) operations and network IO operations increase accordingly when read-write operations are performed. In this way, the quantity of data strips obtained by division when striping is performed on the file data increases accordingly, thereby imposing a heavy burden on the access performance of a distributed file storage system in a scenario of a small file.