A database is an organized collection of data. A database typically organizes data to correspond to how the data is logically arranged. This facilitates operations on the data, for example, looking up values in a database, adding data to the database, sorting the data in the database, or summarizing relevant data in the database. A database management system (“DBMS”) mediates interactions between a database, users and applications in order to organize, create, update, capture, analyze and otherwise manage the data in the database.
Some DBMSs have implemented column-oriented storage of data in a database. A database that uses column-oriented storage is a column-store database. A column-store database can include one or more tables. In a column-store database, a table of data is partitioned into separate columns, and the values of each column are stored contiguously in storage or memory. The columns of a table typically have the same length (that is, number of records, or rows). The columns are independent, in that a column does not necessarily have to be written directly after the column that precedes it in the table. Column-oriented storage is efficient when aggregating values in a single column. Column-oriented storage also facilitates compression. Within a column in a database table, values may repeat. In many cases, the number of distinct values in a column is smaller than the number of rows in the table. To reduce how much memory is used to store column data, a DBMS can represent the set of distinct values in a dictionary, which is an auxiliary data structure that maps value identifiers (“value IDs”), often integers, to distinct values. When analyzing data in a column-store database, a user or application may request that a DBMS sort values of a column that have been compressed using a dictionary. Existing approaches to sorting dictionary-compressed values are inefficient in many scenarios, however.