A Key-Value Store represents a very promising alternative to traditional relational database management systems (RDBMS). Many application systems, such as web-based applications/services (e.g., Amazon's Dynamo, Google's Big Table, and Facebook) which do not require complex SQL queries, use a Key-Value Store to store and access their data. Typically, data in a Key-Value Store are organized in a table structure with rows and columns, where each row represents a key-value pair (one column as the key and the other columns as the value).
User queries submitted to a Key-Value Store may vary significantly. For instance, one query may access all columns in a table (referred to as full-record access), whereas another query may access only a subset of the columns (referred to as partial-record access). Full-record access is typical in OLTP (online transaction processing) applications, such as online shopping and online gaming, where insert and update operations require the entire record to be read or written. Partial-record access is typical in OLAP (online analysis processing) applications, such as data mining and other business intelligence tools, where only a few attributes (columns) of a table are required, even if the table consists of many attributes. Accordingly, two types of table layout schemes, i.e., a row-oriented layout scheme and a column-oriented layout scheme, can be found in the prior art. In the row-oriented layout scheme, table data are stored row-by-row, where the entire record of a row is stored contiguously. In the column-oriented layout scheme, table data are stored column-by-column, where attribute values belonging to the same column are stored contiguously. It should be noted that the row-oriented layout scheme is optimized for full-record access (to add/modify a record requires single access), but might access unnecessary data for a query which requests only a subset of the columns. In contrast, the column-oriented layout scheme is optimized for partial-record access (only relevant data needs to be read), but is inefficient for inserting or deleting a record (a write requires multiple accesses).
Recently, efforts have been made to support both row-oriented and column-oriented layout schemes in one system, such as U.S. Pat. No. 7,024,424 (“Storage of Row-Column Data”), U.S. Pat. No. 7,447,839 (“System for a Distributed Column Chunk Data Store”), and U.S. Pat. No. 7,548,928 (“Data Compression of Large Scale Data Store in Sparse Tables”). However, none of these can dynamically change the table layout scheme according to user access pattern. On the other hand, Fractured Mirrors (see, e.g., “A Case for Fractured Mirrors”, VLDB 2002) stores a table in a row-oriented layout scheme in one disk, and mirrors the table in a column-oriented layout scheme in another disk. A full-record access query is served by the table in the row-oriented layout scheme, while a partial-record access query is served by the table in the column-oriented layout scheme. One drawback of Fractured Mirrors is that no matter how the user access pattern changes, both layout schemes coexist and are maintained for a table.
In order to be adaptive to user access pattern, Fine-Grained Hybrid designs, such as Data Morphing (see, e.g., “Data Morphing: An Adaptive, Cache-Conscious Storage Technique”, VLDB, 2003) and U.S. Pat. No. 7,580,941 (“Automated Logical Database Design Tuning”), were proposed to store a table in a row-oriented layout scheme in a disk, and to dynamically reorganize the table data, based on user query, into a column-oriented layout scheme in RAM.
The Fine-Grained Hybrid design is limited to one storage node. Extension of this design to a distributed storage system, in cooperation with data replication and failure recovery, is unknown and nontrivial. However, to accommodate exponentially growing data, it is valuable for a key-value store to be able to scale to multiple storage nodes and distribute the table data for better performance.