A distributed file system is a distributed storage system for building local storage based on computer storage nodes. As compared with traditional storage systems, the distributed file system has advantages such as a high storage cost-effectiveness and strong expandability.
Current distributed file systems usually are composed of a metadata node and multiple storage nodes, wherein the metadata node is used for storing data block situations of each file, names of storages nodes where respective data blocks lie, and information of data blocks included by each storage node and so on. The storage nodes are used to store actual data blocks, and usually the size of each data block is 64 MB or 128 MB. The storage nodes will regularly send the locally stored data blocks to the metadata node to enable the metadata node to sense data storage locations of all files in the distributed file system. When a client needs to perform data accessing operation to the above distributed file system, it first acquires or establishes the positions of the data blocks of the file from the metadata node, and then directly communicates with the storage nodes where the corresponding data blocks lie to perform operations such as reading or writing data blocks.
For the current distributed file system, it usually stores the data blocks with a plurality of copies in order to improve system reliability and data availability. For example, three or more copies may be stored for the same data block at different storage nodes, wherein each copy is stored at one storage node. As such, the content of each file is stored at a plurality of storage nodes. Once an individual storage node breaks down, data of the whole file still can be acquired from the storage nodes storing other copies of the file.
When the client needs to write a file into the distributed file system, the client first communicates with the metadata node to acquire from the metadata node the positions of the blocks corresponding to the file, namely, a list of storage nodes storing the blocks corresponding to the file. The storage node list indicates that the same copy of data needs to be written into different storage nodes, i.e., the same copy of data has a plurality of copies at different storage nodes. Then the client selects, from the returned storage node list, the closest storage node in respect of the network address, writes the data into the storage node, and meanwhile informs the storage node which storage nodes the copy of data also needs to be written into. After that the storage node repeats the above storage procedure until all storage nodes in the storage node list accomplish storage.