With the development of the Internet, HDFS (Hadoop Distributed File System) has emerged. HDFS can be used to segment a file which needs to be stored into 64M of blocks. On the one hand, in order to enhance the security of data, and reduce the risk of data loss, each block of the file may be saved for a plurality of copies, data corresponding to each of the copies is identical, and each of the copies is stored in different DataNode. On the other hand, attribute information of the file can be stored to the NameNode by the HDFS, wherein, the attribute information includes: the size of the file, the number of blocks, and the location of each copies corresponding to each block, etc.
Based on the above-described storage mode of the HDFS, when reading contents of the file, a client of the HDFS firstly obtains a block list of the file from the NameNode and the locations of all the copies corresponding to each block. For a block including a plurality of copies, the client calculates the distances to all the copies of the block, and then sorts the copies according to the distances, such as, from far to near. Then the nearest copy can be selected and downloaded, and if the nearest copy fails to be downloaded, a next copy can be selected and downloaded according to a sorted order, thereby obtaining all the blocks of the file by that analogy.
However, in the process of implementation of the present disclosure, the existing method for downloading file based on the HDFS exists the following problems: because physical distance has smaller role and influence, and network distance changes due to the constantly change of network status, so that the distance between the client and the copies cannot be accurately calculated; moreover, if a DataNode is preferred to be selected by many clients, but the DataNode can only respond to the download of just one client every time, so that other clients can only wait for the downloads, which greatly reduces the download efficiency of a file.