The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Consistent Read is characterized by two rules. First, every statement executed by a transaction sees only (1) changes that were committed to the database by a particular set of committed transactions (the “snapshot set”) and (2) changes made by the transaction itself. Consistent Read requires that queries see “snapshots” of the database. Changes made by excluded transactions may be committed to the database before a query of a serializable transaction has completed. When such changes occur, the database no longer reflects the data as it existed in the snapshot. Consequently, database servers typically provide a mechanism for determining how the database appeared in the snapshot. One such mechanism is described in U.S. Pat. No. 5,870,758, entitled “Method and Apparatus for Providing Isolation Levels in a Database System”, the entire contents of which is incorporated herein.
To implement Consistent Read, database servers typically assign a snapshot number to a transaction when the transaction commits to indicate to other transactions the transaction set(s) to which the transaction belongs. To establish a snapshot set for a query, a logical time stamp (SCN) is assigned to the query, and the database server only permits the query to see changes made by transactions that were assigned snapshot numbers that are lower than the SCN assigned to the query. Consistent Read becomes more complicated in systems that store multiple sets of the same data, where updates cause one set of the data to be updated and the other set of the data to be invalidated.
Specifically, database systems typically store database objects (e.g. tables, indexes, etc.) on disk, and load data items from those database objects into volatile memory on an as-needed basis. Once loaded into volatile memory, the data items may remain cached in volatile memory so that subsequent accesses to the same data items will not incur the overhead of accessing a disk. Those data items may be replaced in cache, for example, to make room in volatile memory to store other data items that have been requested.
Rather than load individual data items on a per-item basis, entire database objects, or portions thereof, may be pre-loaded into volatile memory. Various approaches for loading entire database objects, or selected portions thereof, into volatile memory to speed up database operation processing are described in U.S. patent application Ser. No. 14/337,179, entitled “Mirroring, In Memory, Data From Disk To Improve Database operation Performance”, filed Jul. 21, 2014, referred to herein as the “Mirroring” application, the contents of which is incorporated herein in its entirety.
According to the approaches described in the Mirroring application, database objects, or portions thereof, are stored in volatile memory in a different format than the format that those same objects have on disk. For example, the in-memory copies of the objects may be stored in a column-major format, while the on-disk copies are stored in a row-major format. An in-memory version or copy of an object (or selected portions thereof), is referred to herein as an In-Memory-Columnar Unit (IMCU).
On-disk data is stored in “blocks”. For disk blocks that store data for a table of a database, a single block may contain items from a plurality of rows of the table. When a particular node wishes to make a change to one or more items that are stored in a particular block, the particular node obtains a write lock that covers the particular block, loads a copy of the block into a buffer cache in volatile memory, and makes the change to the cached copy of the block. When the transaction that made the change commits, the changes made by the transaction become permanent.
When a node commits a transaction to the database which changes one or more items, the copies of those data items that reside in the node's IMCU becomes stale. For example, assume that block A has items X, Y and Z, and that a particular node executes a transaction that updates item X. Assume further that copies of items X, Y and Z are in an IMCU in the particular node. After the update to item X made in the particular node, the copy of item X in the IMCU in the particular node is stale because the particular node does not update data in the IMCU directly. To prevent the use of stale data, the particular node stores invalidity data which identifies, as of the last transactional state of the database, which items in the mirrored data are not transactionally consistent with corresponding items in the database. The last transactional state of the database may include a current transactional state of the database.
When the node receives a query for data mirrored in the IMCU, the node can check the current invalidity data to determine which rows in the IMCU are invalid, i.e. not transactionally consistent with corresponding rows in the database. Thus, in response to the query, the node can return items Y and Z from the IMCU and obtain versions of item X from other sources, such as an in-memory journal, buffer cache, or disk. While the invalidity data on its own generally allows for efficient use of data mirrored in the IMCU, the invalidity data is restricted to the last transactional state of the database.
As mentioned above, to implement consistent read, each query is assigned an SCN. The SCN assigned to a query identifies a transactional state of the database at a particular time. If transactions that made changes to items in the IMCU commit after a query is assigned an SCN but before the query accesses the IMCU, then (a) the invalidity data will indicate that some items that can be used by the query are invalid, and (b) the SCN of the query will be less than the SCN associated with the invalidity data. As an example, if changes were made to item X at a first SCN and changes were made to item Y at a second SCN, the invalidity data would identify both item X and item Y as invalid. If a query is processed with a query SCN between the first SCN and the second SCN, then the invalidity data would still identify both item X and item Y as invalid, even though only item X was invalid as of the query SCN. Queries that have been assigned SCNs that are older than the SCN of the invalidity data of an IMCU are referred to herein as past-state consistent-read queries.
One way to better take advantage of IMCUs when processing past-state consistent-read queries is to construct a past-state delete vector, as described in U.S. Pat. No. 9,128,972, entitled “Multi-Version Concurrency Control On In-Memory Snapshot Store Of Oracle In-Memory Database”, the contents of which are incorporated herein. A past-state delete vector includes one bit per row of an IMCU, where the bit for a row has a value that indicates whether the data in the IMCU for that row was valid as of the snapshot time associated with the delete vector. Unfortunately constructing a past-state delete vector every time a past-state consistent-read query is processed may incur an unacceptable amount of overhead.
One way to reduce the overhead of processing a past-state consistent-read query using data from an IMCU involves the use of temporal clones, as described in U.S. patent application Ser. No. 14/506,613, entitled “Temporal Clones To Identify Valid Items From A Set Of Items”, filed Oct. 4, 2014, the contents of which are incorporated herein by reference. A temporal clone is a cached copy of a bitmap vector. Similar to past-state delete vectors, a temporal clone includes one bit per row, and indicates which rows in an IMCU are valid as of a “clone time” associated with the temporal clone.
Unfortunately, even when using temporal clones to determine which data items within an IMCU can be used by a past-state consistent-read query, the conversion of the past-state delete vectors into the row identifiers of the actual rows that are invalid can incur a significant amount of overhead. Further, the larger the bitmap, the greater the overhead.