Based on the storage format of data records, a relational database system can have either row store or column store to save data records. In a row store database, data records are arranged in row format. A row store usually delivers good performance for online transaction processing (OLTP) transactions which often contain INSERT/DELETE/UPDATE operations. For instance, information can be changed for an entire row that involves one disk I/O operation. On the other hand, in a column store database, data records are arranged in column format. A column store delivers good performance for online analytical processing (OLAP) queries, as it need only read those columns necessary to process a query which also significantly reduces disk I/O operations.
Since row store and column store are good for different types of queries, it has been proposed to contain both row store and column store, i.e. hybrid row/column store, in a database system to handle a mixed workload. The goal is to achieve good OLTP performance and output real time (or almost real time) analytic results on a single database system. The common way to build a column store database from a row store database is to extract data records from row store, transform them, and load the data into attribute vectors for respective columns. That is, the column store is built statically before any analytic queries are received or executed. For example, an ETL (Extract-Transform-Load) process is executed during off-peak periods where the database is unavailable for online access in order to transfer the data to the column store database.
However, the typical approach to build a column store database is problematic as it necessarily overbuilds the database to include data that is irrelevant for any subsequent query or queries. In particular, it is often difficult to foresee which queries will be run on the database, and to which data those queries are directed. As such, the column store most likely will be over-built in order to accommodate all potential queries (e.g., building an attribute vector or column for every attribute in the row store database).
In addition, with increasing demand for constant access to database records by customers and companies located around the world, there is probably no good time window long enough to perform an ETL process to transfer data from row store into column store. That is, there are no longer any off-peak periods and not enough time to build a column store database without severely affecting the accessibility of the database system.
Furthermore, traditional hybrid row/column store database systems cannot provide real-time analytics. That is, analytic queries cannot be executed in real time, since the queries must wait for ETL operations to be completed during their scheduled times before query execution.
As such, existing hybrid row/column store database systems still rely on a predictive approach for guessing which attributes will be accessed during execution of queries, and tend to overbuild its column store databases before any query is received or executed. In addition, to facilitate speed of access, these traditional column store databases are built in main memory, such as random access memory (RAM), but will suffer a huge performance penalty during a system crash when the entire column store database is erased. As a result, the system remains down during a system recovery process, wherein the entire column store database is rebuilt.
It would be advantageous to build a column store database from a row store database that is not overbuilt for the queries requiring access to data, and that provides real-time query analytic execution.