A database may refer to a collection of related records that is created and managed by what is commonly referred to as a database management system. One type of database is a “relational database.” A relational database may refer to a database that maintains a set of separate, related files or tables, but combines data elements from the tables for queries and reports when required.
The present invention is directed to a relational database that stores a particular type of data, referred to herein as “analytic data.” Analytic data may refer to data that is analyzed. For example, stock transaction data may be analyzed for trends such as the age group of the individuals engaged in stock transactions. In another example, insurance data may be analyzed to determine whether it is profitable to maintain particular individuals as customers. In another example, data may be analyzed for fraud.
Often the analytic data stored in a relational database is “compressed” in order to maximize the amount of data stored in a given amount of disk space. Data compression may refer to the process of encoding information using fewer bits than an unencoded representation (original format of the data) would use through use of specific encoding schemes. For example, an article could be encoded with fewer bits if we accept the convention that the word “compression” be encoded as “comp.” Once the analytic data is compressed, the compressed analytic data may be “read-only.” Read-only may refer to data that will not change after it is compressed. It is noted that when “compressed data” is used herein that “compressed data” refers to “compressed analytic data.”
When a user desires to access the compressed data in the relational database, the compressed data needs to be “decompressed” in order to reverse the effects of data compression. Decompression may refer to the act of reversing the effects of data compression which restores the data to its original form prior to being compressed. In this manner, the user is able to retrieve the requested data in its original form.
The present invention is directed to a decompression approach that does not decompress the entire rows of compressed data in a relational database table at a single time. Instead, the present invention is directed to a decompression approach that selectively decompresses column data in relational data tables, such as decompressing the compressed data row by row and then column by column within each row as each row is needed by the relational query processor.
One such method (“commonly referred to as the “control block method”) used in such a decompression approach involves a decompression program reading information from a data structure, commonly referred to as a “control block,” associated with a particular table of the relational database. The control block may store algorithms and parameters used to identify the particular subroutines to call to decompress the data in the table. The decompression program may read column by column within a row. After reading a column, the decompression program may read the information in the control block to call the appropriate subroutine to decompress the data for that column. The same process is repeated for the other columns in the row. The control block method uses a small amount of code to decompress the compressed data. However, a drawback to using the control block method is an excessive number of computer cycles being used to decompress the compressed data resulting in poor performance for query programs that access large amounts of data.
Another method (“i-code method”) for decompressing compressed data used in the decompression approach discussed above is to create a string of commands, stored in a list (“i-code list”), that perform the same functionality as the parameters in the control block. These string of commands are created for a particular table of the relational database which are used by the decompression program to uncompress the data for each row of the table, row by row and column by column within the row. The string of commands may be referred to herein as “i-code” or “p-code,” which are pseudo-code, i.e., not machine executable code. The i-code may be built by an “i-code builder” which is executed by an “i-code interpreter.” The i-code interpreter interprets each of the commands. That is, the i-code interpreter interprets each command, one at a time, and then executes the code in-line thereby avoiding the subroutine calls. In comparison to the control block method, many machine cycles are eliminated to uncompress data. It does, however, require more programming effort as an i-code builder and an i-code interpreter have to be built. Further, while the i-code method does reduce the number of machine cycles used to decompress compressed data, there is still improvement to be made. By further reducing the number of machine cycles used to decompress compressed data, query programs will be able to access large amounts of data more quickly.
Therefore, there is a need in the art for further reducing the system resources (e.g., machine cycles) used to decompress read-only analytic data in a relational database table.