Apache Hadoop project (hereinafter “Hadoop”) is an open-source software framework for developing software for reliable, scalable and distributed processing of large data sets across clusters of commodity machines. Hadoop includes a distributed file system, known as Hadoop Distributed File System (HDFS). HDFS links together the file systems on local nodes to form a unified file system that spans the entire Hadoop cluster. Apache HBase (hereinafter “HBase”) is a project built on Hadoop. HBase is a scalable, distributed, NoSQL (Not-Only Structured Query Language) datastore that supports structured data storage for large tables.
HBase is usually run on multiple machines or cluster. Sometimes unexpected problems can occur which can corrupt the HBase and lead to down time and data loss. The problems can arise from risks within the cluster, outside the cluster or from users. Repairs for these potentially disastrous situations usually require time-consuming manual analysis and expert knowledge of HBase's internals.