In recent years, with continuous progress of social economies and rapid development of science and technology, data scales of large, medium, and small enterprises rapidly expand accordingly. How to improve storage efficiency and an access rate of big data has an important practical application value. A distributed file system is a file management system suitable for big data storage. In this system, a physical storage resource is unnecessarily connected to a local node, and is connected to multiple nodes using a computer network. In the distributed file system, a big data block is divided into multiple small data blocks and the small data blocks are stored on multiple nodes such that the distributed file system has relatively high fault tolerance and a relatively high throughput. A HADOOP distributed file system (HDFS) is a commonly used file system that has high fault tolerance, can be deployed on a cheap machine, and is extremely suitable for use in a large-scaled data set. In the HDFS, data is grouped into a data block and the data block is stored on a disk of a data node (DN). An application task may read data from the data block on the disk, or write data into the data block on the disk. However, in a task of analyzing large-scale data, the application task needs to repeatedly read data on the disk and repeatedly write data to the disk. Consequently, data input/output (IO) takes a large amount of time, and a task runtime is excessively long.
An IO rate of a memory is much faster than an IO rate of a disk. Therefore, when performing resource scheduling, the HDFS at a current stage counts a historical quantity of times that each data block is accessed within a preset time period, then determines a data block that is accessed more frequently as a hotspot data block, and moves the hotspot data block into a memory of a DN. In this way, the application task can access the hotspot data block by directly using the memory of the DN, thereby improving data IO efficiency.
However, a historical quantity of times that a data block is accessed cannot accurately reflect a hotspot degree of the data block. Even if the historical quantity of times that the data block is accessed is relatively large, a quantity of times that the data block is accessed after the data block is moved into the memory may be extremely small. In this case, if the data block is moved into the memory, not only data IO efficiency cannot be prominently improved, but also memory resources are unnecessarily wasted.