In the prior art, in order to guarantee the high efficiency of data processing and the centralized management of the metadata, the large scale distributed data processing file system is generally designed as a metadata centralized management server (such as the file location register (FLR)), and a plurality of data file storage server (such as the File Access Server (FAS)).
When a user accesses the data, he/she firstly inquires of the FLR the specific storage location of the data via the file access client (FAC), and then the FAC initiates a read-write data request to the specific FAS. The way that the FAS manages the data file is to divide the file data into individual CHUNKs, and each file consists of a plurality of CHUNKs. The way of the CHUNK matching to the file is identified by a uniform identifier FILEID, and each file has a FILEID different from other files, and the CHUNKID of each CHUNK is FILEID+CHUNK number. The distribution information of all CHUNKs in a file is uniformly put in the database and managed by the FLR.
In a large capacity cluster system, generally the CHUNKs are redundantly backed up, that is, the copies of the same CHUNK are stored in a plurality of FASs. However, it is hard to maintain the consistency of several copies of a CHUNCK in the prior art, which is a relatively big problem, mainly represented in the following conditions: in the process of writing operation, how to guarantee simultaneously writing the corresponding copies in a plurality of FASs; if there is one FAS abnormal or broken, how to reconstruct the data in this FAS; during the writing process, how to guarantee the consistency of FLR record and FAS if the FLR is abnormal.
Since it relates to massive CHUNKs, general check method such as MD5 cannot be applied to the CHUNKs in the prior art because this will severely affect the processing performance.
Therefore, the prior art should be improved and developed.