In large commercial database systems it is often beneficial to partition the table of a database into smaller tables or segments, such that each smaller table or segment is capable of being individually accessed within a processing node. This promotes reduced input and output when only a subset of the partitions is referenced and improves overall database performance.
A popular approach to segmenting databases is referred to as row (or horizontal) partitioning. Here, rows of a database are assigned to a processing node (by hashing or randomly) and partitioned into segments within that processing node of the database system.
Another approach is to group columns together into segments (referred to as column or vertical partitioning), where each group of columns for rows assigned to a processing node are partitioned into segments within that processing node of the database system.
Both row and column partitioning have advantages to improving overall database performance.
In addition, a recent approach combines both horizontal and vertical portioning together. In particular, the approach finds a value within a container row associated with a specific row identifier (SRowld). The row identifier can be used to read the container row that has a beginning row identifier (BRowld), that is the highest row identifier less than or equal to SRowld. To find the value associated with SRowld, SRowld−BRowld+1 is calculated, call this n, and the nth value in the container row then needs to be found. Presence bits, VLC bits, run length bits (that is, the autocompression bits) may occur in the container row and must be checked in sequence to find this nth value since values may be omitted or multiple occurrences of a value compressed to one occurrence of the value (in the case of run length bits that indicate a run length greater than one). While checking, a pointer to the current column partition value must be incremented when the value is present. A container row can represent 1000's of values. Sequencing though all these bits (there is one set for each value represented) to determine the corresponding value could take a long time particularly for large commercial databases.