A database may refer to a collection of related records that is created and managed by what is commonly referred to as a database management system. One type of database is a “relational database.” A relational database may refer to a database that maintains a set of separate, related files or tables, but combines data elements from the tables for queries and reports when required.
The present invention is directed to a relational database that stores a particular type of data, referred to herein as “analytic data.” Analytic data may refer to data that is analyzed. For example, stock transaction data may be analyzed for trends such as the age group of the individuals engaged in stock transactions. In another example, insurance data may be analyzed to determine whether it is profitable to maintain particular individuals as customers. In another example, data may be analyzed for fraud.
Often the data stored in these related tables is “compressed” in order to maximize the amount of data stored in a given amount of disk space. Data compression may refer to the process of encoding information using fewer bits than an unencoded representation (original format of the data) would use through use of specific encoding schemes. For example, an article could be encoded with fewer bits if we accept the convention that the word “compression” be encoded as “comp.” Once the analytic data is compressed, the compressed analytic data may be “read-only.” Read-only may refer to data that will not change after it is compressed. It is noted that when “compressed data” is used herein that “compressed data” refers to “compressed analytic data.” It is further noted that when “database” or “relational database” is used herein that “database” or “relational database” refers to a “read-only database” or a “read-only relational database,” respectively.
When a user desires to access the data in the database, the compressed data needs to be “decompressed” in order to reverse the effects of data compression. Decompression may refer to the act of reversing the effects of data compression which restores the data to its original form prior to being compressed. In this manner, the user is able to retrieve the requested data in its original form.
The present invention is directed to a decompression approach that does not decompress the entire rows of compressed data in a relational database table at a single time. Instead, the present invention is directed to a decompression approach that selectively decompresses column data in relational data tables, for rows that are used by a specific query access as that query is being processed.
There are many different compression algorithms used to encode or compress the data stored in relational databases, such as the Huffman algorithm and the Lempel-Ziv algorithm. These compression algorithms focus on maximizing the amount of compression. That is, these compression algorithms focus on maximizing the amount of data stored in a given amount of disk space. However, compressed data using these compression algorithms require extensive system resources (disk access time and instruction cycle time) in order to decompress the compressed data. That is, the time required for a user to retrieve the requested data in its original form from the relational database may be extensive using such high compression algorithms.
Hence, there is an inverse proportionality between compression efficiency and access performance (amount of system resources to decompress the compressed data). If a balance could be achieved between compression efficiency and access performance, then a balance may be made with saving disk space while, at the same time, improving access performance. That is, if data could be compressed in a manner that closely maximizes the amount of data stored in a given amount of disk space as these high compression algorithms but requires much less system resources to decompress the compressed data, then a better balance may be realized between compression efficiency and access performance. Currently, there are no products that attempt to provide such a balance.
Therefore, there is a need in the art for compressing analytic data in a manner that closely maximizes the amount of data stored in a given amount of disk space as these high compression algorithms but requires much less system resources to decompress the compressed data.