Modern systems and services demand high performance data management. Numerous database systems have been developed to handle ever growing volumes and uses of data. Column-oriented data base systems, or column stores, for instance, have seen a recent resurgence, particularly in performance-intensive use cases. More formally, a column-store can be defined as having one or more tables where each table is a collection of related columns of equal length, a tuple representing a single row within the table. Thus, tuples consist of values aligned in columns and can be retrieved from the set of table columns using a single row-id, or tuple, index value.
As compared with traditionally more popular conventional row-oriented database systems, column-stores can realize faster read requests of data with fewer disk input/output (I/O) operations per transaction. Column-stores can make use of a decomposed storage model (DSM) where data is persisted in column-oriented storage blocks, rather than row-oriented, or other storage blocks. As read requests can be implemented as scans of a (typically small) identified subset of columns in a table, fewer column block reads and corresponding disk I/O operations can be used to fulfill the read request (relative to overhead of using a row-based structure). On the other hand, column stores have been recognized as having higher costs than some other database systems when it comes to updating data. Consequently, recent implementations of column store systems tend to focus on read-only, or read-mostly optimized database applications such as data warehousing, data mining, and other application areas having a relatively high proportion of read-to-update requests.
In conventional implementations, a variety of design and operational techniques have been employed to attempt to enhance the read-oriented performance of column-store databases. For instance, column store databases can allow sorting, or ordering of column store tuples, to permit improvement of read-oriented performance. Through a defined sort ordering, tuples can be stored in a sort order according to a sequence of sorted attributes values corresponding to a specified sort key for the table. Scans of the sorted table can be restricted to a fraction of the disk blocks, proving particularly advantageous in cases where a scan query contains a range or logical predicates dependent on any prefix of the sort key attributes. Other conventionally employed techniques include data compression, clustering, and replication.
Like reference numbers and designations in the various drawings indicate like elements.