A distributed storage system is typically provided with a redundant data storage mechanism, that is, multiple copies of the same data are stored on different nodes, thereby benefiting from the advantages of high reliability of data storage, and a high throughput of data retrieval. Unfortunately, the issue of synchronizing the copies of the data, which may be referred to as version control, also exists.
In a centralized storage system, version control is relatively simple because the time when one of copies is modified recently is taken as a criterion and the recently modified copy is the latest version due to uniqueness of a clock source. However, it is relatively difficult to ensure highly precise synchronization of clocks at respective nodes in the distributed system and consequently very difficult to establish a method for version control of copies although this is a very important issue. For example, a balance of 10 Yuan is shown in a copy 1 of the account of a person, while a balance of 20 Yuan is shown in a copy 2 thereof, and at this time it may be difficult for the distributed system to determine an actual balance of the person. Therefore, version control is an issue highly desired to be addressed in the distributed system.
Version control in “Advanced Replication” of the existing distributed system “Oracle” relies on a two-phase commitment protocol. FIG. 1 illustrates a flow chart of the two-phase commitment protocol in the prior art, and in the protocol as illustrated in FIG. 1, activities of respective resource managers are controlled by a separate software component of a transaction coordinator, which involves: the transaction coordinator instructs the resource managers to prepare for commitment of a transaction (Prepare), the resource managers respond to the transaction coordinator (Vote Commit), and the transaction coordinator collects respective responses of the resource managers (Global Commit) and notifies the resource managers of a result of the transaction and receives responses of the resource managers (ACK). As illustrated in FIG. 1, the version control method for Oracle is simple in which the time of the coordinator is taken as the version number of data, but it might be very likely to overwrite the latest data with earlier data during data recovery of the system if the coordinators were not temporarily synchronized. Therefore, temporary synchronization is required to address the issue of version synchronization in the two-phase commitment method, which may degrade usefulness of the system and make Two-Phase Commitment (2PC) very costly.
For updating in version control of the existing distributed redundant storage system, temporary synchronization methods generally include: master-slave temporary synchronization of relevance to the invention, temporary synchronization in the Byzantine protocol and convergent functional temporary synchronization, where the Network Time Protocol (NTP) has been mostly widely applied. In the master-slave temporary synchronization, there is a necessary stationary server which synchronizes its own time through a satellite or updates synchronously its own time over a connection to the Internet, and a client interacts with the server to synchronize its own time.
During making of the invention, the inventors have identified at least the following problems in the prior art.
1. Poor extensibility: the existing version control method for the distributed system is very demanding for clock synchronization and difficult to perform at the ten thousand-scale or hundred thousand-scale of the system.
2. Low usefulness of the system: the respective copies in the existing distributed system have to be kept the latest, and a flow of data modification, etc., fails if a node where any of the copies is located fails, but a failing node is very common in large-scale networking, and if each mirror image of the data is kept as the latest version, performance of the system may be degraded markedly to thereby greatly reduce usefulness of the system.
3. Poor applicability: the time is synchronized through a satellite or updated synchronously over a connection to the Internet for updating in version control of the existing distributed system, and the solution is relatively complex and of poor applicability.
4. High cost and considerable effort: temporal synchronization is relatively demanding for hardware in large-scale networking of the distributed system, and current temporal synchronization requires manual setting of master and backup clocks, so excessive dependency upon manual setting results in a considerable effort of large-scale networking.