The amount of data generated, collected and saved by businesses is increasing at an unprecedented rate. Businesses are retaining enormous amounts of detailed data, such as call detail records, transaction history, and web clickstreams, and then mining it to identify business value. Regulatory and legal retention requirements are requiring businesses to maintain years of accessible historical data.
As businesses enter an era of petabyte-scale data warehouses, advanced technologies, such as data compression are increasingly utilized to effectively maintain enormous data volumes in the warehouse. Data compression reduces storage cost by storing more logical data per unit of physical capacity. Performance is improved because there is less physical data to retrieve during database queries.
Currently, most of database systems provide compression features. For example, database systems provided by Teradata Corporation support block level compression (BLC), multi-value compression (MVC) and algorithmic compression (ALC) controlled at the column level. There exist various compression algorithms that can be used for BLC and ALC, but different compression algorithms usually have dramatically different compression capabilities on data with different characteristics. Determining which compression algorithm to use for a particular data type or situation is a very common problem for database providers and users, often requiring a significant amount of manual effort to make the best choice of compression algorithm.
Described below is an improved method and system for data compression utilizing multiple encoding tables and leveraging different compression algorithms to achieve improved compression performance results for both short and long data streams.