Database management systems such as, for example, DB2 UDB (universal database) use data compression techniques to reduce storage requirements for a given database. One advantage of compression is that input/output (I/O) is reduced considerably, since lesser numbers of pages need to be read in to the page buffers. Existing approaches such as, for example the implementation in DB2 UDB for Windows/Unix/Linux will compress the entire table regardless of how frequently certain records are used. When a certain row needs to be accessed for a read or update, the data page is fetched onto the page buffer and the corresponding row is decompressed for usage. After usage, the row is compressed back and the data page is inputted and/or outputted to permanent storage.
However, there is a considerable central processing unit (CPU) overhead associated with compressing and decompressing data. Most of the time, the database manager might end up compressing data that is very often used and therefore use up many CPU cycles decompressing and compressing every time this data is accessed. There is no existing approach to specify predicate based compression for data that will make sure that only data that is not frequently used by database applications and/or users, which can be specified by a user predicate, need to be compressed. As such, within existing approaches, there are no ways for old and/or obsolete data to be compressed and stored on secondary storage, while current data that is frequently accessed to remain uncompressed so that the database manager does not have to go through the overhead of decompressing every time this data is accessed.
As noted above, existing approaches disclose no solutions to this problem. The existing implementation for data compression in DB2 UDB for Windows/Unix/Linux, for example, will compress the entire table regardless of how frequently certain records might be used.
Database administrators (DBAs) can be advised to partition their database and compress only that partition that contains less frequently used rows. While this is a work-around, it is not an advantageous solution for the problem. Partitions require their own indexes for access, and moreover, the current compression implementation in DB2, for example, requires each partition to have a separate compression dictionary.
Existing approaches also include further disadvantages. For example, some existing approaches do not include the ability to compress data based on a structured query language (SQL) type predicate. Others do not include a predicate, and consequently, user specification of the predicate clause is unavailable. Additionally, some existing approaches do not introduce new SQL statements related to data compression on rows. Other approaches compress the entire data in information unit or based on data filter stages. Further, some existing approaches allow for only the compression method to be selected based on the data type that is being compressed.