It is common for database systems to store data persistently in disk blocks. Typically, within each disk block, data is arranged in row-major format. That is, the values of all columns of one row are followed by the values of all columns for the next row. To speed up performance, some of the disk blocks may be cached in a buffer cache within volatile memory. Accessing the data from volatile memory is significantly faster than accessing the data from disk. However, even within the volatile memory, the data is still in the format of row-major disk blocks, which is not optimal for certain types of database operations.
In contrast to row-major disk blocks, columnar formats have many attractive advantages for query processing in memory, such as cache locality and compression. Consequently, some database servers now employ new table types for persistently storing data in column-major formats. When column-major format data is read into volatile memory, it can be used to process certain queries more efficiently than would be possible if the data were stored in row-major disk blocks.
Rather than load individual data items on a per-item basis, entire database objects, or portions thereof, may be pre-loaded into volatile memory. Various approaches for loading entire database objects, or selected portions thereof, into volatile memory to speed up query processing are described in U.S. patent application Ser. No. 14/377,179, entitled “Mirroring, In Memory, Data From Disk To Improve Query Performance”, filed Jul. 21, 2014, (the “Mirroring Application), the contents of which are incorporated herein in its entirety. This application describes storing database objects, or portions thereof, in volatile memory in a different format than the format that those same objects have on disk. For example, the in-memory copies of the objects may be stored in a column-major format, while the persistent copies are stored in a row-major format. Queries that are most efficiently processed using row-major data may use the persistent copies, and queries that are most efficiently processed using column-major data may use the in-memory copies.
When multiple copies of a data set are maintained and used to respond to a database request, the database server must ensure that transactional consistency is maintained. For example, a when a copy of a particular item is updated in the persistent data set as part of a database transaction, the database server may maintain transactional consistency by also updating the particular item in the in-memory data as part of the same transaction. When the in-memory data and the persistent data are transactionally synchronized, the result set of a query will be the same regardless of whether the query was processed using data items obtained from the in-memory data or the persistent data.
Unfortunately, this technique does not work well due to the lag that occurs between replicated systems. Specifically, at any given point in time, some changes made at one of the replicas will not yet have been applied to the other replica. Consequently, the lag inherent in the replication mechanism may result in unpredictable artifacts and, possibly, incorrect results. Furthermore, it may be expensive to keep the in-memory data up to date with all changes being made to the persistent data. For example, the in-memory data may be in a compressed format that would require an entire set of columnar data to be decompressed to perform any corresponding update.
Another technique, used in Oracle Database In-Memory (DBIM), stores in-memory columnar units (IMCU). IMCUs are snapshots of the underlying data as of a point in time and their accesses are most suited for analytic queries. Any DML activity (e.g. insert, update or delete of row/s) gets recorded as an invalidation row within the unit, thereby avoiding the processing overhead and memory consumption of synchronization. A query scanning the IMCU retrieves valid data from the IMCU and gets rows corresponding to the invalidation records from a source other than the IMCU. The other source may be, for example, an in-memory journal for the IMCU, blocks cached in the buffer cache, or blocks retrieved from disk. As the number of invalid rows within the IMCU increases, the more frequently the database must retrieve items from sources other than the IMCU. Such out-of-IMCU item retrievals increase the scan latency of the IMCU. Eventually, the IMCU may contain so many invalid items that it is no longer efficient to perform scans using the IMCU. To avoid this condition, IMCUs may be rebuilt or repopulated with valid data when the number of invalidations crosses a threshold.
After an IMCU is rebuilt, all the data is valid as of a second point in time. However, IMCU repopulation is a time-intensive process during which the IMCU is unavailable. That is, an existing IMCU is first marked offline for queries before its repopulation is commenced. The new IMCU is made available to queries only when the repopulation procedure completes. The unavailability of the IMCU during repopulation means that all items contained therein (whether or not invalid) must be retrieved from another source, such as persistent data blocks, even if accessing the other source is less efficient. This results in regression of scan latency over a non-trivial window of time. The regression worsens when multiple queries are concurrently issued on the same table, as each of these queries gets affected by the unavailability of the IMCU. In an intensive Online Transaction Analytical Processing (OLTAP) environment, heavy DML activity causing updates to the underlying data may require a corresponding IMCU to undergo frequent repopulations. This will result in unavailability of an IMCU for a substantial number of queries, thereby affecting performance.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.