It is well known in the art for a computer 10 (FIG. 1A) to be programmed with a database management system (DBMS) 11 that provides support for OLTP systems. OLTP systems typically require real time processing of statements that are expressed in a structured query language (SQL), which has been standardized by American National Standards Institute (ANSI). SQL statements to access and change data may use one of the four keywords SELECT, UPDATE, DELETE, and INSERT, which together form a Data Manipulation Language (DML) part of SQL.
On receipt of such statements, it is necessary for computer 10 to execute the requested transaction on the database very quickly, and respond back to the OLTP system with appropriate results. Examples of OLTP transactions include processing of real time sales, e.g. at a cash register in a grocery store, a database of inventory is updated in real time, as items are scanned by the clerk. A prior art DBMS 11 is typically implemented in computer 10 to include a parser, an optimizer, a query plan generator and an execution engine (not labeled; see FIG. 1A). The execution engine operates on data brought from a disk 18 to a region in main memory 17, which region is known as a buffer cache, or simply cache.
During normal operation of DBMS 11, data is retrieved from disk 18 in fixed size pieces, called “blocks”, “pages”, “data blocks”, “logical blocks” etc, an example of which is block 13 (FIG. 1A). A block is the smallest unit of data that is stored into or retrieved from a database on disk 18. The block typically includes a header region 14, data in region 16, and an empty region 15 in to which additional data can be written. Contiguous blocks that are allocated for storing a specific type of information constitute another level of data granularity, and are together called an “extent.” A set of extents which are stored in the same table space constitute a segment in a table. For more information on storage structures used to access data in databases, see Chapter 2 of the publication “Oracle Database” Concepts, 10gRelease 2 (10.2)”, publication B14220-02, published in October 2005.
Referring to FIG. 1A, block 13 is an in-memory copy of a block B1 from disk 18 (which disk holds the database managed by DBMS 11). Initially, for a given table in the database, DBMS 11 starts with an empty block B1 at time t1, and as a number of OLTP transactions take place, block B1 to eventually fills up, as illustrated in the time sequence t1-t5. When a new row can no longer be written to block B1, DBMS 11 allocates a new block B2, which is then again progressively filled as shown by the time sequence t6-t9. When block B1 cannot hold a new row, DBMS 11 again allocates a new block B3 (at time t10). Note that all data in blocks B1-B3 is uncompressed, i.e. prior art DBMS 11 does not support compression of blocks, B1-B3. Instead, data that is received from an OLTP application is stored directly, i.e. uncompressed in database 18.
One reason for storing database data uncompressed is that transactions for an OLTP application must be executed in real time, whereas compression can delay the response. However, compression for certain data warehouse transactions that are executed as batch jobs, such as bulk loads into the database, is supported in certain prior art DBMS systems, such as ORACLE 10gR2 sold by ORACLE CORPORATION, Redwood Shores, Calif. Specifically, stored data on disk can be compressed by reducing or eliminating duplicate values in a block of data. For example, as illustrated in FIG. 1B, duplicate values (“Meyer”, “11 Homestead Rd”, “1.99”, “McGryen” and “3 Main Street”) are replaced with references to a symbol table which holds a single copy of each duplicate value. The smallest granularity of values in a symbol table is at the level of a column. However, a value in a symbol table can occur as a sequence of column values, and this is called multi-column compression.
One prior art method used to support bulk load operations in a relational database management system (RDBMS) is described in a paper by Miekel and Poess and Dmitry Potapov entitled “Data Compression in Oracle” published in the Proceedings of the 29th VLDB Conference, Berlin, Germany, 2003, which paper is incorporated by reference herein in its entirety as background.
One reason compression works well in bulk load operations is that all of the data to be written to a database block is immediately available within the prior art DBMS, for use in computing an appropriate symbol table that is applicable to the entire block. For the same reason, compression is normally not available for data management language (DML) statements of the type generated during on-line transaction processing (OLTP). Specifically, due to their nature, OLTP transactions within the computer may access different tables, and even if the same table is accessed they may access different blocks. Accordingly, only a small fraction of the data of a block may be supplied by each OLTP transaction, which makes compression of such data (and therefore compression of the block) difficult.