The present invention relates generally to database architectures and operation, and more specifically to enhancing performance of an HBase on data in data regions that experience an abundance of reads and writes.
Database structures have evolved as their uses, contents, sizes, and supporting hardware have evolved. Traditional Relational Database Management Systems (RDBMS) are oriented toward maintaining the integrity of database transactions in real time (e.g., an airline reservation system or financial market transaction system), contain tables of structured data on which involved operations may be performed during a query, and are based on the concept of a shared disk subsystem. A RDBMS may be accessed with Structured Query Language (SQL), a data manipulation language, to manage, access, and operate on a content of the RDBMS. RDBMS was originally formed to run on a single machine and doesn't easily leverage the performance and economic advantages of running on a cluster of many smaller and cheaper machines. As software and hardware capabilities and interest in analyzing and manipulating large amounts of diversified and sometimes unstructured data (termed big data) has increased, scalable, distributed databases have been created using dynamic, semi-structured or wide column table formats that are populated with persistent data. For example, horizontally scalable databases called noSQL databases have become popular. The name noSQL is an ill-defined term given to a growing number of mostly non-relational open-sourced databases that may not provide atomicity, consistency, isolation and durability guarantees that are key attributes of classic relational database systems, but are amenable to scaling horizontally, i.e., by using more machines. NoSQL databases are finding significant and growing use for large data volumes and real-time web applications on large distributed systems.
There are several types of noSQL databases that can be classified by the data model they use. Bigtable and HBase are examples of noSQL databases that use a key-value data model, a model that can provide high performance, high scalability, high flexibility, and design simplicity. HBase is a sparse, distributed, persistent, multidimensional sorted map, which is indexed by row key, column key, and timestamp. A key-value database enables an application to store data in a schema-less way with no database-aware data structure that is built into the database. Key-value database systems are often distributed among multiple computing systems for to enhance performance and often exhibit high performance on sequential writes or sequential reads, but much lower performance when servicing a mixture of reads and writes to a same data region in the database. HBase is implemented using Hadoop as a scalable resilient underlying file system.