In recent years, central processing units (CPUs) of computer processing hardware have generally experienced their greatest performance increases by increasing the number of processor cores rather than through increasing clock rates. Accordingly, to maximize performance, modern software advantageously employs the benefits of multi-core CPUs by allowing parallel execution and with architectures that scale well with the number of cores. For data management systems, taking full advantage of parallel processing capabilities generally requires partitioning of stored data into sections or “partitions” for which the calculations can be executed in parallel.
A database program or database management system generally displays data as two-dimensional tables, of columns and rows. However, data are typically stored as one-dimensional strings. A row-based store typically serializes the values in a row together, then the values in the next row, and so on, while a column-based store serializes the values of a column together, then the values of the next column, and so on.
Column-based storage can facilitate execution of operations in parallel using multiple processor cores. In a column store, data are already vertically partitioned, so operations on different columns can readily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core. In addition, operations on one column can be parallelized by partitioning the column into multiple sections that are processed by different processor cores. Column data is typically of uniform type, which can facilitate opportunities for storage size optimizations available in column-based data stores that are not available in row-based data stores. For example, some modern compression schemes can make use of the similarity of adjacent data to compress. To improve compression of column-based data, typical approaches involve sorting the rows.
Column-based storage typically makes searching of a table quite efficient, but at the expense of insertion speed for new records. In column-based storage, tables are generally organized in one or more main partitions and one or more delta partitions that retain insertions, deletions, changes, etc. to the data stored in the main partition(s). New records are inserted into the delta partition rather than into the main partition so that the main partition can retain a compressed structure to maximize the efficiency of searches on the data in that column. Direct insertion of new records into a compressed main partition is generally not possible because the compression changes the structure of the column. A search of the current data in the column requires traversal of the one or more main partitions followed by a traversal of the delta partition(s). As the delta partition or partitions increase in size, the search process becomes progressively less efficient as a larger and larger amount of less well-structured data in the merge partition(s) must be traversed in response to each query. Accordingly, delta partitions must be periodically merged to the main partition(s).