1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to numeric, decimal and date field compression in database systems.
2. Description of Related Art
The ability to manage massive amounts of information has become a necessity for business today. With the advent of data warehouses, it is not uncommon for databases to store very large tables that comprise gigabytes, terabytes, petabytes or more, because businesses are retaining enormous amounts of data and then mining it to identify business value. Regulatory and legal retention requirements are also leading businesses to keep years of historical data accessible to data warehouses.
Compression is used to reduce storage cost by storing more logical data per unit of physical capacity. Performance may be improved as well, because there is less physical data to retrieve from data storage devices. Performance may be further enhanced since data can remain compressed while cached in memory. Consequently, there are many benefits to the use of compression in data warehouses.
A problem arises, however, in that most data warehouses offer only a few types of compression. Generally, different types of data require different types of compression for optimal performance. Consequently, most data types are not optimally compressed in most data warehouses.
Consider, for example, the following SQL statement which defines a decimal column in a table using the CREATE TABLE command:
CREATE TABLE tab1 (col1 DECIMAL (p,s))
In this example, “tab1” is the table name, and “col1” is a first column of the table having a numeric data type as indicated by the DECIMAL mnemonic (or alternatively, the DEC or NUMERIC mnemonics) with a fixed precision p and a scale s.
The precision p is the maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point, wherein, in one embodiment, the precision has a value from 1 through a maximum precision of 18 and a default of 5 (although other embodiments may have different values). The scale s is the maximum number of decimal digits that can be stored to the right of the decimal point, wherein the scale must be a value from 0 through p, the scale can be specified only if the precision p is specified, and the default scale is 0; therefore, 0<=s<=p.
The amount of storage required for this data type varies, based on the precision. In one embodiment, decimal values are scaled by the power of ten equal to the number of fractional digits. The scaled decimal value is stored as a two's complement binary number in one, two, four, or eight bytes. The number of bytes used for the scaled decimal value depends on the total number of digits defined for the column.
Specifically, depending on the precision of the column (e.g., 1 to 18), storage of a decimal value requires between 1 and 8 bytes. The following table shows the number of bytes used to store decimal values.
TABLE 1Number of Defined DigitsNumber of Bytes1 to 213 to 425 to 9410 to 188
However, the same amount of storage is used, based on the column definition, regardless of the actual data values stored in the column. Specifically, every row in the table consumes the same number of bytes for the column, even if the value in the row could be stored in a smaller number of bytes. Consequently, significant compression can be performed on the actual data values.
In the prior art, a number of solutions have been presented for the compression of such data. For example, the VARDECIMAL data type is available, wherein the VARDECIMAL columns of each row consume only the space that is required to contain the data, plus 2 bytes of overhead to store an offset to the data. In another example, a length field may precede the compressed data, which is stored in a variable length field.
In yet another example, a multi-value compression may be performed, where the compressed value is explicitly defined in the column definition. This requires specifying each value in the compression specification and each value takes one bit in every data row.
Some database vendors offer solutions based on specific values across many rows in the table. However, this type of solution would not work well for situations where the rows in the table have different values, like birthdays or sales price.
Notwithstanding these various solutions, there remains a need in the art for improved data compression functions. Specifically, there remains a need for improved compression of short fields, such as decimals, numerics, and dates. The present invention satisfies this need.