In a distributed storage system, multiple node devices are connected to form a cluster, where the multiple node devices all have a data storage function. All node devices are connected through a front-end network and a back-end network. The front-end network is used for exchange of requests and data between user services and the distributed storage system, and the back-end network is used for exchange of requests and data between node devices inside the distributed storage system.
In a distributed storage system, user data is stripped into stripes, and then data strips of a stripe are distributed to hard disks of different node devices and stored there. When accessing user data, an application server first sends an access request to one node device through the front-end network, and then, the node device reads, through the back-end network, data strips where the user data is located to the local node device, restores the data strips to the user data using a Redundant Array of Independent Disks (RAID) algorithm or an erasure code algorithm, and returns the user data to the application server through the front-end network.
In the above user data access process, a caching technology is used. In one caching method, each node device caches in its cache hot-spot data blocks of the local node device. When a node device needs to obtain a data stripe, the node device needs to obtain data blocks that constitute the data stripe from caches of node devices. If the required data blocks cannot be obtained from the caches of the node devices, the node device further needs to access hard disks of the node devices and obtain the data blocks from the hard disks. Then, the node device aggregates, rebuilds and performs redundancy check on the obtained data blocks to obtain the data stripe. In another caching method, each node device caches in its cache hot-spot file data stripes obtained according to statistics by the node device. When a node device needs to obtain a data stripe, the node device first obtains the data stripe from its own cache. If the required data stripe cannot be obtained from the cache of the node device, the node device needs to obtain the data strips of the data stripe from hard disks of the node devices in the distributed storage system.
At present, the data caching technology adopted in a distributed storage system is one of the aforesaid two caching methods or a combination thereof. Using the current caching methods in a distributed storage system, each node device determines, according to access statistics, hot-spot content among content stored in its hard disk and caches the hot-spot content in a cache. Because each node device performs the caching independently, it is possible that the same content is cached in different node devices. As a result, the cache utilization rate of node devices is low.