The disclosure relates generally to database systems, and more particularly to table compression in a database.
In all areas of the industry, the amount of data to be stored is exploding. Also in the field of relational databases, ever growing amounts of data are to be managed. For transactional as well as decision support systems, relational databases may be used. A relational database may store data in the form of tables including rows and columns. It shows that in the rows of relational databases repeated data patterns may be found.
Databases may typically be stored on hard disks. Although hard disk prices are constantly decreasing, the data volume increases and thus, database sizes grow over-proportional. In order to save disk space for data in databases, data compression technologies have been introduced. Existing compression methods may replace most frequent patterns in data with shorter symbols and use dictionaries to map the symbols to the replaced patterns.
A common approach for such a compression method may include usage of fixed length symbols instead of variable length symbols, as this may simplify an implementation of the compression method. For example, a symbol length of 8 bits may allow replacing up to 256 of the most frequent patterns in the data, whereas a symbol length of 16 bits may allow replacing 65,536 of the most frequent patterns in the data.
However, there may be a trade-off between the symbol length and the size of a related compression dictionary. Smaller symbols may require less space, but may reduce the total amount of patterns that can be replaced. Larger symbols allow more patterns to be replaced, but may require more space.
There are several disclosures related to data compression in relational databases. Document US 2009/0193041 A1 discloses obtaining a workload specification for a database. Based on the workload specification, candidates of the tables may be identified and ranked. A compression impact may be evaluated for the candidates of the table. A design for the database may be developed specifying at least one of: (i) which of the tables should be compressed, and (ii) which of the tables should not be compressed.
Document US 2008/0294676 A1 discloses methods and apparatus for compression of tables based on the occurrence of values. In general, a number representing an amount of occurrences of a frequently occurring value in a group of adjacent rows of a column may be generated. Furthermore, a vector representing whether the frequently occurring value exists in a row of the column may be generated, and the number and the vector may be stored to enable searches of the data represented by the number and the vector.
However, there may be a need to overcome a limited amount of supported replacement symbols with a given symbol length, and hence to overcome limited compression rates.