To maximize performance, optimize use of computing resources, and/or for various other reasons, modern software architectures frequently take advantage of multi-core CPUs and/or distributed processing systems by allowing parallel execution and with architectures that scale well with the number of cores and/or computing nodes. For data management systems, approaches to taking full advantage of parallel processing capabilities and main system memory available on a number of distributed systems can include partitioning of stored data into sections or “partitions” for which calculations can be executed in parallel and which can be stored and/or operated on across a distributed network of computing nodes.
A database program or database management system generally displays data as two-dimensional tables formed of columns and rows. However, data are typically stored as one-dimensional strings. A row-based store typically serializes the values in a row together, then the values in the next row, and so on, while a column-based store serializes the values of a column together, then the values of the next column, and so on.
Column-based storage can facilitate execution of operations in parallel using multiple processor cores and/or more than one computing node and can also enable efficient data storage. In a column store, data are already vertically partitioned, so operations on different columns can readily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core or computing node. In addition, operations on a given column can be parallelized by partitioning the column into multiple sections that can be processed by different processor cores or computing nodes. Partitioning refers generally to splitting one or more columns of a column-store database table horizontally (e.g. by making one or more divisions along a vertical length of the column into two or more sub-columns or partitions). In this manner, large columns (or tables of more than one column) can be broken down into smaller, more manageable parts. For example, partitioning can be used to limit the amount of data to be loaded into memory at any given processing node or to be transferred between nodes. Partitioning is typically used in multiple-host systems, but it may also be beneficial in single-host systems.
Partitioning of a column can be based on specified criteria applied to split the database table. In general, a partitioning key is used to assign values in the column to a partition based on one or more criteria. Commonly used approaches include range partitioning, list partitioning, hash partitioning, round robin partitioning, and composite partitioning. In range partitioning, a partition can be defined by determining if the partitioning key is inside a certain range. For example, a partition can be created to include all rows in which values in a column containing postal codes are between 70000 and 79999. In list partitioning, a partition can be assigned a list of values and the partition can be chosen if the partitioning key has one of the values on the list. For example, a partition built to include data relating to Nordic countries can includes all rows in which a column of country names includes the text string values Iceland, Norway, Sweden, Finland, Denmark, etc. In hash partitioning, the value of a hash function can determine membership in a partition. For example, for a partitioning scheme in which there are four partitions, the hash function can return a value from 0 to 3 to designate one of the four partitions. Round robin partitioning can be used to distribute storage and/or processing loads among multiple data partitions and/or servers or server processes according to a pre-set rotation among the available partitions or servers or server processes. As an example, a first data unit can be directed to a first partition of three partitions, a second data unit to the second partition, a third data unit to the third partition, a fourth data unit to the first partition, and so forth. In composite partitioning, certain combinations of other partitioning schemes can be allowed, for example by first applying a range partitioning and then a hash partitioning.