The present invention is related to the field of database management systems. More particularly, the present invention is directed to a method and mechanism of improving performance of database query language statements.
The quantity of data that must be stored in databases and computer systems is increasing as many modem businesses and organizations increase their need to access greater amounts of information. A significant portion of the expense for storing a large quantity of information is related to the costs of purchasing and maintaining data storage systems. Given this expense, approaches have been suggested to reduce the amount of space that is needed to store a given quantity of data.
Data compression is a technique used in many modem computer systems to reduce the storage costs for data. A common approach for implementing compression is to compress data at the granularity of the file. For example, traditional compression approaches such as the Unix-based gzip or DOS-based zip compress an entire file into a more-compact version of that file. A drawback with this type of approach is that if an entire file is compressed, all or a large part of the file must be decompressed before any part of it can be used, even if only a small part of the file is actually needed by a user. This is a problem that particularly exists with respect to compressing files in database systems, in which a single database file may contain large quantities of database records, but only a small portion of the individual records may be needed at any moment in time. Thus, the granularity of compression or decompression may not realistically match the granularity at which data is desirably used and accessed in the system.
However, compression at other granularities may result in storage inefficiencies. For example, certain page-at-a-time compression approaches may lead to compressed pages of different sizes that are inefficiently mapped onto physical pages. In addition, many traditional compression techniques do not even guarantee that data size will not increase after compression.
Moreover, the very acts of compressing and decompressing data could consume an excessive amount of overhead. The overhead is typically related to the specific compression algorithm being used as well as the quantity of data being compressed or decompressed. This overhead could contribute to significant latency when seeking to store, retrieve, or update information in a database system. For example, some compression techniques store compression information separate from the compressed data. Hence, for a simple read access, multiple locations in the database may need to be accessed and expensive decompression operations may need to be performed.
Given the latency problems, as well as less-than-certain compression gains, the trade-off between time and space for compression is not always attractive in a database or other type of computing system. Hence, there is a need for a compression technique that not only results in the reduction of disk space used, but also has no negative impact on the performance of database query language statements against compressed data.
The present invention provides a method and mechanism of storing and retrieving data in a compressed format. In one embodiment, data compression is performed on stored data by reducing or eliminating duplicate data values in a data block or other storage unit. In another embodiment, information describing data duplication within the data block is maintained. The present invention also provides a method and mechanism of improving performance of database query language statements. In one embodiment, data duplication information maintained is maintained and used to reduce number of predicate evaluations. In another embodiment, the data duplication information is used to reduce amount of data accessed by a database query language statement.
Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the invention.