The literature lists many standalone compression techniques such as value list compression, run length encoding, Trim length encoding, NULL compression, UTF-8 compression, and Delta compression. Each of these has variants. For example, a trimmed value can be expressed using either the number of bytes remaining or the number of bytes trimmed.
It is possible to combine standalone techniques to form complex combined techniques. For example, value list and run length can be combined with trim to compress a “container row” (which is the terminology TERADATA® Corporation uses for what is conventionally referred to as a “columnar row”). Data in a container row is serialized (i.e., converted from a n-dimensional table configuration to a one-dimensional configuration suitable for storage in a computer memory) by column instead of by row. As a result, a container row is more likely to have data of a single type (e.g., integer, character, etc.), making compression simpler in some cases. A container row may contain multiple columns and multiple data types. Even in that case, though, compression of container rows is likely to be simpler than compressing of row because the number of data types subject to compression is likely to be fewer.
It is possible to stack standalone compression techniques and/or combined standalone compression techniques. Standalone techniques and their combinations with user specified techniques can be stacked to compress a container row. For example, value list and run length can be combined with trim or user specified table value list compression can be combined with value list or with value list and run length.
The above mentioned standalone techniques, their combinations, and further combinations with user specified techniques creates a large number of possible ways to compress a container row. Selecting a technique from among this large space of possibilities is a challenge.