A distributed file system based on distributed file system operation needs to solve all requirements related to high availability, for example, meeting requirements including leader selection, leader notification, Fence mechanism, replication and persistence of data streams, Compaction of the data streams, and the like.
A current distributed file system usually utilizes the following two high-availability design schemes:
1. A combination scheme: for example, a distributed service framework (ZooKeeper) scheme for leader selection, a heartbeat-based scheme or other independently-designed schemes. Fence adopts a third-party system or an independently-designed scheme; and data stream replication adopts an independently-designed replication scheme, and the like.
2. An unitary scheme: for example, a distributed consistency protocol (Raft), various libraries are realized based on the Raft protocol, and log streams of Raft are stored locally.
However, in the above two high-availability related schemes, the combination scheme often depends on a third party, even on multiple third parties at the same time, there are various independently-designed schemes in the system, and the overall architecture of the system is complex, high in stability risk and difficult to evolve. In the unitary scheme, a distributed file system implemented based on the Raft protocol is very slow to complementally build a copy, and it is difficult to implement a fast load balancing mechanism. In particular, it is difficult to implement system wide data stream splitting and merging, and cooperate with the distributed file system. For example, for a database suitable for unstructured data storage (HBase), if the data streams use the Raft protocol, a large amount of extra traffic overhead can be caused if it is still based on the distributed file system. If it is not based on the distributed file system, the stateless characteristics of HBase nodes will be lost.