A distributed data storage system is a storage system composed of multiple storage devices interconnected by a network. In the system, data is backed up on multiple data nodes. Data nodes of a conventional distributed data storage system usually include multiple master nodes; each master node saves one part of all data, and is connected to a group of slave nodes. When reading data, a user directly executes a data read operation on a master node; and when writing the data, the user executes a data write operation on the master node, and copies the data from the mater node to the slave nodes, so that the slave nodes save the same data copies as the master node connected to the slave nodes. When the master node fails, one slave node is upgraded to the master node through master-slave node switching, thereby ensuring normal read and write operations. In the conventional storage system of the master and slave nodes, each slave node must be configured with hardware with performance similar to the master node, so as to replace a failed master node to work, which leads to exorbitant hardware cost. In addition, in a background that the current network is usually loosely networked, connection interruptions or timeouts often occur in nodes in the network, thereby resulting in frequent switching between the master and slave nodes, so the system performance is affected.
To solve the problems of the conventional storage system of the master and slave nodes, the prior art provides an (N, W, R) strategy based management solution. This solution cancels a concept of the master and slave nodes, and each data node saves one part of all the data; moreover, for certain data X, N data nodes are allocated as copy nodes for storing the data X, that is, the data X has N copies saved in the data storage system. When a write operation is performed on the data X, this write operation can be ended only after W copy nodes complete the write operation on the data; and when a read operation is performed on the data X, the data X must be read out in R copy nodes. N, W and R satisfy a relationship of W+R>N, so as to ensure that at least one of the read R data is the latest version.
During implementation of the above solution, the inventor finds that the prior art at least has the following problems: first, in the (N, W, R) strategy based management solution, the latest version of the data can determined only after the read operation is performed on the R copy nodes, so the efficiency of the read operation is very low. Furthermore, the data storage system usually needs to support a complicated condition query performed on the data, that is, selecting, through data traversing, data that complies with a specified query condition, and executing computation or write operation on the selected data; while in the (N, W, R) strategy based management solution, any data has copies saved in N copy nodes; as a result, when the complicated condition query is performed, for each data, the R copy nodes must be traversed and then the data of the latest version can be determined, so the operating quantity of data traversing is extremely large, and it is difficult for implementation in actual application.