Typically, in any system that stores data, there is a mechanism for keeping track of what portions of storage are currently available to store new data, and what portions of available storage are not. The portion of available storage that is currently available for storing new data is generally referred to as “free space”. Within a database system, the processes responsible for keeping track of the free space are collectively referred to as the “space management layer”.
The smallest unit of space that can be independently allocated for use is referred to herein as a “data block” (or simply “block”). In some systems, the size of the data block is 8 kilobytes. For ease of discussion herein and for illustrative purposes only, it shall be assumed that each data block is 8 kilobytes. Because each data block has size 8 k, then it follows logically, and for illustrative purposes only, that the minimum input/output (I/O) size or unit is one 8 k block, or whatever is the minimum block size.
Information about whether a data block on disk is currently available (“free”) or not currently available (“used”) is maintained in a metadata structure. In one system, such metadata structures take the form of “shared-block-usage maps”, each of which resides on an 8K disk block. Table A hereinbelow shows an example in schematic form of the relevant informational data in one shared-block-usage map. The shared-block-usage map in Table A has a column that stores data block addresses and a corresponding column showing the status of the block, whether the block is currently free or used.
TABLE ADATA BLOCKADDRESS (dba)STATUSdba-1useddba-2used. . .. . .dba-128useddba-129free. . .. . .
It should be appreciated that the shared-block-usage map is read from disk every time a transaction, from any user of the corresponding database, requests block usage for any of the blocks represented in the shared-block-usage map. Reading the shared-block-usage map from disk results in a logical I/O being performed. The logical I/O involves reading the 8K shared-block-usage map from disk into a buffer cache in volatile memory or, if a copy of the 8K shared-block-usage map is already in a buffer cache of a remote database server instance, from the remote buffer cache to a local buffer cache.
However, before the shared usage map is read from disk, an exclusive lock is placed on the shared-block-usage map so that no other process can access the shared-block-usage map and modify the data. The result is that the shared usage block can only be used by one process at a time. For example, assume that a transaction requests to insert a row in a table. Insertion of the row requires usage of a disk block. Therefore, a process at the space management layer receives a request for a disk block. In response to the request, the process obtains a lock on a shared-block-usage map, and then reads the shared-block-usage map from disk. The shared-block-usage map contains an entry that indicates that a particular block is current free. The free block is provided to the transaction to allow the transaction to perform the insert operation. A flag in the shared-block-usage map corresponding to the block is set to “used”. The lock on the shared blocked usage map is released, thereby making the shared-block-usage map available for other space requests. Multiple transactions requesting to lock and search the same shared usage map must wait until the shared usage map is available.
In transaction-based systems, such as database management systems, operations that use storage or release storage are performed atomically with the corresponding updates to the free space information. For example, if a database transaction involves operations that use five blocks, then the update to the free space information to indicate that the five blocks are used is performed as part of the transaction. If the transaction fails, then the change to the free space information is rolled back so that the free space information will still show the five blocks to be free. Also, such systems durably store the free space information so that system can continue to work properly after crashes that cause the loss of information stored in volatile memory.
In the context of shared-block-usage maps, durability is achieved by generating redo information every time a shared-block-usage map is updated in volatile memory. Specifically, each block request involves obtaining a lock on a shared-block-usage map, changing the shared-block-usage map, generating redo for the change, and then releasing the lock. The lock on the shared-block-usage map is released as soon as possible, rather than when the transaction that is making the change commits, to allow greater concurrency within the system. Otherwise, the free blocks represented in the shared-block-usage map would not be available to any other transactions until the transaction committed.
When a transaction commits, the redo generated for changes made to shared-block-usage maps by that transaction are flushed to disk. Consequently, even if the contents are volatile memory are lost after a transaction is committed, the disk blocks used by the transaction will continue to be treated as “used”.
As explained above, each request for free space that is made during a transaction results in generation of redo information. Further, even if a transaction requires one megabyte of space, the transaction requests space one 8 k block at a time to support concurrency among multiple instances that have access to the same database resources or objects, such as tables, indexes and the like. Thus, a transaction requesting to store one megabyte of data may issue 128 block usage requests (for 128 8K blocks). The 128 block usage requests would result in 128 changes to shared-block-usage maps, which in turn would cause the generation of 128 redo records. When the transaction commits, the 128 redo records are flushed to disk, along with any other redo generated by the transaction.
The amount of overhead that results from generating undo for each disk block that transitions from free-to-used, or from used-to-free, increases in proportion to the amount of space used or freed by transactions. In the example given above, the storage of one megabyte results in the generation of 128 redo records. However, real life examples of large files include video, x-ray pictures, and high dimensional content, which may exceed 100 megabytes.
An example showing three transaction requests executed in three database server instances, where each database server instance is has access a shared-block-usage map, can be described with reference to FIG. 1. Referring to FIG. 1, a first transaction 102 is executed by the first instance 104, a second transaction 106 is executed by the second instance 108, and a third transaction 110 is executed by the third instance 112. Suppose, in this example, that each of transactions 102, 104, and 106 requests to add rows to the same table 114 of a database 116. To add the rows, each of instances 104, 108, and 112, must read and update shared-block-usage map 118.
For example, transaction 102 requests to add a row to table 114. Instance 104 requests an exclusive lock on shared-block-usage map 118, searches for a data block that is free, performs transaction 102, updates the shared-block-usage map 118 to indicate that the data block as used, generates redo for the update, and unlocks shared-block-usage map 118. Further, instance 104 generates redo information after updating shared-block-usage map to indicate that the data block is used.
While instance 104 has an exclusive lock on shared-block-usage map 118, instances 108 and 112 cannot perform their respective transactions 106 and 110 because instance 108 and instance 112 cannot access to shared-block-usage map 118 to search for free space. One way of supporting concurrency is by ensuring frequent accessibility to the shared-block-usage map. In this example, the transaction, through the instance, obtains a lock on shared-block-usage map 118, writes one block worth of data, updates the shared-block-usage map to indicate that the block is used, generates redo information, and then releases the lock on shared-block-usage map 118. The cycle repeats itself for each of the other two transactions 106 and 110. For writing the second block of data from the one megabyte, the transaction obtains a second lock on shared-block-usage map 118, writes the second block worth of data of a second free block, marks the second free block as used, a second redo operation is performed, and then the transaction lets go of the lock, and so on.