This invention relates to data storage in a computerized storage unit. More particularly, the present invention relates to improved management of directory structures for block-level computer storage using a write-ahead-log for I/O (input/output) access concurrency.
xe2x80x9cBackupxe2x80x9d and xe2x80x9csnapshotxe2x80x9d data storage in a computer system or computer network typically involve xe2x80x9cblock-levelxe2x80x9d copying and storage of data from an original volume of data to a backup or snapshot volume. The original volume is used for active data storage for software that is operating in the computer. The blocks of data in the backup and snapshot volumes, on the other hand, are typically preserved for a time and can be restored to the original volume if the original volume ever becomes corrupted or invalid. The block-level data storage typically involves xe2x80x9cphysicalxe2x80x9d blocks of data allocated in the computer""s main memory and mass storage device(s) (e.g. hard drives, tape drives, compact disk drives, etc.) and is not necessarily concerned with how the data is xe2x80x9clogicallyxe2x80x9d distributed within the data volumes.
Database management, on the other hand, typically involves xe2x80x9crecord-levelxe2x80x9d storage of data. The record-level data storage manages each xe2x80x9crecordxe2x80x9d of data, which is a logical arrangement of the data within the data volume, which may be unrelated to the physical distribution of the data blocks within the storage devices. The record-level data storage, therefore, is typically logically xe2x80x9cabstractedxe2x80x9d from the physical storage thereof, so access to the records (e.g. data reads and writes) must pass through a conversion from a logical description of the desired data in the records to a physical description of the actual data in the storage devices. Thus, block-level and record-level storage techniques have very different applications, uses and purposes.
Database management, unlike block-level storage systems, commonly employs well-known write-ahead logging techniques to enhance the efficiency of updating the records in the database. When performing a write access to a record in the logical volume, information is written to the logical volume in the computer""s main memory and then to the computer""s storage device(s). Afterward, the application that requested the write is informed that the write has completed. With a write-ahead log, however, the information is not written immediately to the logical volume in the storage devices. Instead, the write request is xe2x80x9cloggedxe2x80x9d into the write-ahead log, which is then stored in the storage devices, in a manner that is typically much faster than performing the entire write to the record in the logical volume. Thus, the write-ahead log keeps track of the changes that have been committed to the database, without completing the changes. After several write requests have been logged, they are processed (i.e. the write-ahead log is xe2x80x9cflushedxe2x80x9d) in a single xe2x80x9cbatchxe2x80x9d that quickly saves all of the information to the desired records in the logical volumes in the storage devices. The overall time for logging the write requests and subsequently flushing the write-ahead log is shorter than the time it would take if each write request were completed to the storage devices as they occurred.
A procedure for managing block-level storage for backup or snapshot applications is illustrated in FIGS. 1, 2 and 3. In this example, the volume 100 includes a directory hierarchy having three levels (root and level 2 and 3) of directories that keep track of the data blocks. An initial state of the volume 100 is shown in FIG. 1, in which the volume 100 includes a root directory 102, three level 2 directories 104, 106 and 108, three level 3 directories 110, 112 and 114 and five data blocks 116, 118, 120, 122 and 124. A final state of the volume 100 is shown in FIG. 2 after two new data blocks 126 and 128 are written (xe2x80x9cwrite 1xe2x80x9d and xe2x80x9cwrite 2,xe2x80x9d respectively) to the volume 100, along with a new level 3 directory 130, according to a procedure 132 shown in FIG. 3.
In this example, the write 1 is started (at step 134) in the procedure 132 some time before the write 2, but completes after the write 2 is started (at step 136). For write 1, copies of the root and level 2 and 3 directories 102, 106 and 112 are made and modified (steps 138, 140 and 142, respectively) according to a shadow directory technique described in the second aforementioned patent application.
For write 2, since the root directory 102 has already been copied, the copy of the root directory 102 only has to be modified (step 144). A copy of the level 2 directory 108 has to be made and modified (step 146) according to the shadow directory technique described in the second aforementioned patent application. A new directory cluster has to be allocated (step 148), so the new level 3 directory 130 can be made (step 150) therein.
For write 1, a new data cluster (physical block of memory space) is allocated (step 152) for the new data block 126. The data is copied (step 154) to the new data cluster from a base volume (not shown), according to a copy-on-write procedure described in the first aforementioned patent application, and the copying completes (step 156). Likewise, for write 2, another new data cluster is allocated (step 158) for the new data block 128, the data is copied (step 160) to the new data cluster from the base volume and the copying completes (step 162).
Since both writes 1 and 2 must update a portion of the directories 102, 106, 108, 112 and 130 for the volume 100, the write 1 experiences some xe2x80x9cdeadxe2x80x9d time 164, during which the write 1 must wait for the write 2 to finish copying (step 162) the data to the new data cluster. The level 2 and 3 directories 106, 108, 112 and 130 are written (step 166) to the storage devices for both writes 1 and 2 at the same time. The procedure 132 waits (step 168) for the level 2 and 3 directories 106, 108, 112 and 130 and the data clusters for the data blocks 126 and 128 to be written to the storage devices. The root directory 102 is written (step 170) to the storage devices, which effectively xe2x80x9cactivatesxe2x80x9d all of the copied and new directories 102, 106, 108, 112 and 130 according to the shadow directory technique described in the second aforementioned patent application. The procedure 132 waits (step 172) for the root directory 102 to be written and then passes (step 174) both writes 1 and 2 at the same time to the base volume, according to the first aforementioned patent application. In this manner, the write 1 is delayed for the dead time 164 while the write 2 xe2x80x9ccatches upxe2x80x9d and then the writes 1 and 2 complete together.
It is with respect to these and other background considerations that the present invention has evolved.
An improvement of the present invention is that write requests directed to the same volume do not have to wait for each other to complete. Thus, enhanced concurrency of write requests is achieved. Write-ahead logging features are incorporated into block-level storage, so that directory updates for each write request can be logged and the write request can be allowed to complete independently of other write requests. Therefore, the present invention involves greater efficiency for block-level storage for backup and snapshot applications in computerized storage systems.
These and other improvements are achieved by managing block-level storage in a computerized storage system by recording into a write-ahead log a description of block-level updates made to the data in the volume in a main memory and in a storage device of the computerized storage system preferably in response to write requests received from a request source, such as a storage host device. The block-level updates are made to data clusters in the volume in the main memory and in the storage device. Updates are also made to directory clusters in the main memory. The updates to the directory clusters in the storage device are preferably not made until after a plurality of the updates have been made to the directory clusters in the main memory. Then the updated directory clusters are copied from the main memory to the storage device all at once. In the meantime, before the updated directory clusters are copied to the storage device, the entries in the write-ahead log are copied to the storage device, so that the updated directory clusters can be reconstructed from the write-ahead log and the non-updated directory clusters in the storage device if the volume in the main memory is ever lost or corrupted. Upon copying the descriptions of the block-level updates in the write-ahead log from the main memory to the storage device, the write requests are preferably deemed to have completed, even though the updated directory clusters have not yet been written to the storage device. Therefore, the write requests can complete without having to quiesce the volume with each write request to write the directory clusters to the storage device.
A more complete appreciation of the present invention and its scope, and the manner in which it achieves the above noted improvements, can be obtained by reference to the following detailed description of presently preferred embodiments of the invention taken in connection with the accompanying drawings, which are briefly summarized below, and the appended claims.