1. Field of the Invention
The present invention relates to a data access scheme for making accesses to data stored in a storage device.
2. Description of the Background Art
Conventionally, there have been various types of data access scheme such as the database (abbreviated as DB hereafter) and the file system used for the purpose of enabling a plurality of applications to utilize data stored on the disk. Usually, the application executes a plurality of data read/write operations as one processing unit called the transaction. In the transaction processing, when a plurality of update operations are to be executed, all the updatings are validated only when all the update operations are successful, and all the updatings are invalidated when any one of the updated operations is unsuccessful due to the conflict with another transaction or other causes.
To this end, in the data access scheme, it is necessary to support the DB manipulation to validate all the updatings made by the transaction when it is commit (i.e., finished normally), and to invalidate all the updatings made by the transaction when it is abort (i.e., finished abnormally). Usually, such a DB manipulation is achieved by the following processings.
(1) The manipulation of data is carried out at a buffer in a memory to which data are read out from the disk.
(2) The updated results in the buffer are written into the disk at a time of the commit.
(3) The updatings made in the buffer are invalidated at a time of the abort.
In this type of a conventional data access scheme, as indicated in FIG. 1, the positions of the data in the disk 100 are fixedly determined at a time of inserting these data into the disk 100, and the subsequently updated results of the data are overwritten at the positions determined at a time of the insertion. As a consequence, in a case of updating a plurality of data simultaneously, as these plurality of data to be updated are not necessarily arranged continuously on the disk, there has been a problem that extra seek time is required at a time of the writing into the disk.
On the other hand, in the case of carrying out the data manipulation on the buffer, if the data are to be directly updated on the buffer to which the data are read out from the disk, there is a possibility for the data on the buffer to be written back to the disk by the OS at an arbitrary timing. In order to cope with such a possibility, there is a need to take the following provisions as indicated in FIG. 2.
(1) When the data to be updated does not exist on the buffer 102, the data to be updated is read out from the disk 100 to a first position on the buffer 102.
(2) At a time of updating, the updated data is written into the buffer 102 at a second position different from the position at which the data to be updated is read out from the disk 100 to the buffer 102.
(3) At a time of commit, the updated data is written into the first position from the second position, and if necessary, written back into the disk 100 from the first position.
Thus, the conventional data access scheme has been associated with the problems that a separate buffer region is required for the purpose of the data updating, and that a buffer manipulation is complicated in the data updating and the commit operation.
Now, as a data access scheme for resolving the above noted problems by reducing the seek time at a time of the writing into the disk, there is a scheme called log-structured file system (abbreviated as LSF hereafter) which is disclosed in M. Rosenblum and J. K. Ousterhout, "The Design and Implementation of a Log-Structured File System", Proceedings of 13th ACM symposium on Operating System Principles, pp. 1-15, 1991.
In this scheme, as indicated in FIG. 3, the extra seek time is reduced by collectively writing the updated data to a separated continuous region. Namely, in the LSF, the data are managed in units called blocks, and the writing into the disk is carried out in a unit called segment which comprises a plurality of blocks. In a case of updating a certain data, the block containing that certain data is stored into a buffer region in the memory corresponding to a segment to be written next, and the writing into the disk is carried out when that buffer region becomes full. In this manner, in the LSF, the seek time at the disk can be reduced as the writing into the disk is carried out with respect to a continuous region.
Moreover, in the LSF, the positions of data are managed by meta data and super blocks as shown in FIG. 4. That is, each meta data has pointers to all the blocks belonging to each file, while the super block has pointers to all the meta data for all the files. Thus, in a part (a) of FIG. 4, the meta data 4 for the file 1 has pointers to the blocks 2 and 3 belonging to the file 1 and the meta data 8 for the file 2 has pointers to the blocks 5, 6, and 7 belonging to the file 2, while the super block 9 has pointers to these meta data 4 and 8. In a case of manipulating a certain file, the meta data corresponding to that certain file is searched out according to the pointers of the super block, and the appropriate block belonging to that file is manipulated.
Here, the creation and updating of the file is carried out as indicated in a part (b) of FIG. 4. Namely, a part (b) of FIG. 4 shows a state in which a data is added to the file 1, a content of the file 2 is updated, and a new file 3 is created from a state shown in a part (a) of FIG. 4. In this exemplary case, the operation procedure is as follows.
(1) As the block 7 of the file 2 is updated, the updated result is placed at the block 10, and a new meta data for the file 2 pointing to the blocks 5, 6, and 10 is placed at the block 11.
(2) As a new file 3 is created, the data of this file 3 are placed at the blocks 12 and 13, and a meta data for this file 3 pointing the blocks 12 and 13 is placed at the block 14.
(3) As a data is added to the file 1, the added data of the file 1 are placed at the blocks 15 and 16, and a new meta data for the file 1 pointing the blocks 2, 3, 15, and 16 is placed at the block 17.
(4) The super block is usually set on the memory so that a new super block pointing the meta data 11, 14, and 17 is placed at the block 18, and written into the disk at a time of the check point.
However, because of the collective writing of the updated data into a separate continuous region, this LSF is associated with the following problems.
(1) Garbage collection
In the LSF, old blocks which become unnecessary as a result of updating (such as the blocks 4, 7, and 8 in FIG. 4) are produced discontinuously, so that there is a need of an operation called garbage collection (or segment cleaning) for securing a continuous free space by filling these old blocks with valid blocks, and this additional operation gives an additional overhead.
(2) Sequential ordering of writing into the buffer
In a case a plurality of threads are to carry out the update operations in parallel, there is a need to write all these updatings collectively into the disk. To this end, when the buffer corresponding to the segment of the LSF is to be shared by the threads, the writing into the buffer must be carried out in a sequential order, and this requires an extra processing for the concurrency control, which in turn gives an additional overhead.
In addition to these two problems, as the LSF is basically to be used as the file system of the OS, when this LSF is to be used for the DB manipulation, the following problems also arise.
(3) Index
In order to make accesses to the DB, various types of index have been developed, and a typical example is the tree structured index called B tree. Unlike the LSF which has only two hierarchical steps, the B tree usually has many hierarchical steps and a part corresponding to the meta data of the LSF itself has hierarchical structure, so that there is a need to write many new meta data in a case of rewriting the data. In this B tree, the data and the index are mixedly present on a single disk just like the LSF, but in the modified index called B+ tree, the data and the index are separated such that the index arranged on the disk can be searched by a reduced number of disk accesses. However, in order to support this type of index, the data position management scheme different from the LSF is necessary.
(4) Extra data writing
In the usual DB, a plurality of data are present in a single block. In such a case, according to the LSF, the entire block must be newly written even in a case of updating only one data in a certain block, so that the extra data writing will be required.