Traditionally, data has been stored in either row-major format or column-major format. However, neither storage format seems to be optimal for meeting a variety of data access requirements. For example, some operations may be performed faster on the row-major data than on the column-major data, while other operations may be performed faster on the pure columnar data than on the row-major data. This is often due to the fact that some data access operations place demands on CPU resources, while other place demands on I/O operations. Hence, maintaining a balance between a CPU load and a disk load may be challenging in determining a data format for storing the data.
One solution includes creating two copies of the data on a disk: one copy for each format. This approach is often referred to as a fractured-mirror approach. However, although the fractured-mirror approach seems to solve the data access dilemma for a query processing, the approach may be expensive to implement. For example, the cost of doubling disk capacity and replicating the data may be high. Also, the time to load the data onto the disk may be significantly long.
Another solution to the data storage problem may include creating columnar indexes on the most commonly used columns of the data, and relying on the indexes to speed up access to the data access. However, that approach may also be costly and time-consuming.
Other solutions may be based on a Hybrid Columnar Compression (HCC) approach. In HCC, the data in a set of blocks are pivoted into column-major format. The column runs may be limited to a default length, locally specified for the blocks. This approach allows both efficient row-major access, by accessing a few contiguous blocks, and efficient table-scan access, by immediate navigation to the required columns and fast columnar processing of operations on those columns.
Yet other solution focus on reducing a CPU cost of processing common expressions. This may be addressed by indexing the expressions using functional indexes, or materializing those expressions during the load of virtual columns.