Embodiments of the invention relate to reduced fixed length sort of variable length columns.
A DataBase Management System (DBMS) may use Structured Query Language (SQL) statements. The SQL statements have evolved into a standard language for DBMS software. The DBMS uses SQL statements for storing and retrieving data in a database. The database is organized into tables that consist of rows (also referred to as tuples or records) and columns (also referred to as fields or attributes) of data.
A table in a database may be accessed using an index. An index is an ordered set of references (e.g., pointers) to the rows in the table. The index is used to access each row in the table using a key (i.e., one of the fields or attributes of the row, which corresponds to a column). The term “key” may also be referred to as “index key” for an index. Without an index, finding a row requires a scan (e.g., linearly) of an entire table. Indexes provide an alternate technique to accessing data in a table. Users can create indexes on a table after the table is built. An index is based on one or more columns of the table.
A query may be described as a request for information from a database based on specific conditions. A query typically includes one or more predicates. A predicate may be described as an element of a search condition that expresses or implies a comparison operation (e.g., A=3).
For variable length data in variable length columns of a table, such as variable length character (VARCHAR), variable length graphic (VARGRAPHIC), and variable length binary (VARBINARY), the table is created with variable length column definitions that exceed the length required to store actual data. This is often to support future requirements for data growth or application flexibility by not imposing arbitrary limits on the amount of data that a user can input.
A DBMS may use sort keys to improve sort performance, since byte-by-byte comparisons may be performed efficiently. Such sort keys may be fixed length sort keys. A table in a database may have several columns, and at least some of these columns are used for the sort key for that table. The sort key may be said to be concatenated in that several columns are used for the sort key. For variable length data, such as variable length character (VARCHAR), variable length graphic (VARGRAPHIC), and variable length binary (VARBINARY), a sort process will first pad out the columns for this data to their fixed length so that the sort key comparison occurs with each row having exactly the same key length. For example, if a name is VARCHAR and defined as having 128 bytes, and an actual name has 100 bytes, then the sort processes pads the rest of the name field (28 bytes) with, for example, blanks or spaces.
A DBMS stores variable length data in indexes and on data pages with their actual length of the data and a numeric prefix that indicates the length, rather than storing the data as its full defined length. Therefore, there is no downside for data storage of 10 bytes of actual data being stored in a VARCHAR(128) field (i.e., a field having 128 bytes) versus a VARCHAR(512) field (i.e., a field having 512 bytes), since both will only store the actual 10 bytes of data with a prefix that stores the length of the data row.
However, the downside of over-allocating the column lengths for fixed length sort is that the sort key is padded to its full defined length, meaning that the same 10 bytes will be padded with an extra 118 bytes for VARCHAR(128), or an extra 502 bytes for VARCHAR(512). This has a significant impact on the amount of memory and other resources (e.g., disk storage, intermediate work files, etc.) required to sort the rows (if sort cannot be contained in memory).
Some existing solutions use a radix sort. With this, it is recommended to group varying length columns of a sort key with same length together and separately perform a radix sort on each group of columns of the sort key for each length, in order to avoid processing the whole list of columns of the sort key on every sorting pass.