1. Field of the Invention
The present invention relates to a file system with a file management function including transaction management, and more particularly to a file system with a file management function for updating a file in a temporary writing area, and a file management method for use in the system.
2. Description of the Related Art
In general, transaction management requires four basic conditions, so-called ACID characteristics, i.e., atomicity, consistency, isolation and durability. In particular, atomicity (A), consistency (C) and durability (D) are realized by commit/roll-back and recovery. The method for realizing ACD by a resources manager includes logging and the side-file technique.
<Logging>
Logging is a technique for holding, in a log, the state (UNDO) of data before update and the state (REDO) confirmed by commitment, when the data is updated. In this technique, when a system is finished abnormally, already updated data before commitment is returned to the state assumed before update using an UNDO log. In contrast, non-updated data after commitment is shifted to a definite state using a REDO log. Logging includes various techniques, such as a non-steal technique in which data before commitment is not written, a technique for physically acquiring a log, a technique for logically acquiring a log, etc.
<Side-File Technique>
The side-file technique is described in, for example, Jim Gray and Andreas Reuter, trans. Masaru Kitsuregawa, TRANSACTION PROCESSING: CONCEPTS AND TECHNIQUES, Nikkei Business Publications, Inc., Vol. 2, pp. 860-870, 2001, (Jim Gray and Andreas Reuter, TRANSACTION PROCESSING: CONCEPTS AND TECHNIQUES, Morgan Kaufmann Publishers, Inc., 1993). In the side-file technique, data (object) is not updated at “the original location”, and a new data value is written “to the side”, i.e., written at “another location”. In this technique, “the original location” and said “another location” belong to the same location in the logical space. In light of this, the side-file technique provides a mechanism for managing the way of holding different images in a time-series manner. The shadow page technique is known as a typical method for realizing the side-file technique. The shadow page technique is characterized in that completely atomic update is performed, therefore failure recovery is not necessary when a system malfunctions.
In the shadow page technique, double combinations of page tables and bit maps are utilized. Each page table has one entry in units of file blocks (pages). Each entry in each page table is provided with a number assigned to a slot that stores the current image of the corresponding block. On the other hand, each bit map is formed of one bit per slot, and indicates whether each slot holds a block image at present. Further, a directory indicates the combination of the currently effective current-page table and bit map. In the shadow page technique, atomicity is guaranteed by the update of the directory.
The shadow page technique is characterized in that the update of data is performed at “another location”, and a plurality of (e.g., a pair of) page tables are utilized. By virtue of this, even if a malfunction occurs at any time between a data update process and a directory update process, such update can be cancelled without any recovery process.
It is known that logging is superior in performance to the shadow page technique. File management systems required to provide high availability must resume services in a short time in case of a failure. Therefore, they often utilize logging to realize high reliability and throughput. However, in logging, it is necessary, in case of a failure, to recover non-updated data from a log by roll forward. This recovery may require a long time (e.g., about several minutes).
On the other hand, the shadow page technique is superior to logging in easiness of install and the speed of recovery from a failure. As mentioned above, in the shadow page technique, no recovery process is necessary, and hence services can be recovered within a short time (e.g., within several seconds). However, the conventional shadow page technique has the following problems in performance:
Page tables of a large size are needed; Fragmentation easily occurs; and Commitment costs high.
Thus, the shadow page technique is not practical. In particular, in the case of using a database of a large scale, the shadow page technique increases the disk access cost and hence conspicuously degrades the performance of the system. The above problems will now be described in detail.
Firstly, in the conventional shadow page technique, when a database of a large scale is used, it is not guaranteed that the two page tables are completely stored in the main memory. For instance, if the page size is 2 kilobytes (KB) and the database size is 1 terabyte (TB), the number of pages is 500 mega (M). In this case, the two page tables must have as large as a size of 500 M×2×4 B=4 GB, assuming that one entry is of 4 bytes (B). Thus, page tables of a large size are needed. If the table size is enormous, the buffer-hitting rate of each page table may be reduced to less than, for example, 90%. Since, on a disk, the page tables are allocated away from data, if the page tables do not hit the buffer and reading of data from the disk is necessary, random access between each page table and database abruptly increases. In this case, the access performance is significantly degraded. Further, even if the buffer-hitting rate is almost 100%, the access to the buffer is increased, which inevitably degrades the access performance.
In addition, if blocks (pages) are reallocated as a result of data update, fragments will occur, thereby increasing random block input/output. Accordingly, the data transfer rate during data access is reduced. Further, during commitment, it is necessary to simultaneously write data to the page tables and the disk. In particular, writing of data blocks becomes randomly. This increases the running cost during commitment. Moreover, to keep consistency in the system, it is necessary to exclude simultaneous writing to the disk (commitment processing). Therefore, update cannot be executed during backup processing. In other words, backup processing cannot be performed during update.
Isolation of transactions is one of the basic conditions for transactions. Specifically, it is necessary to prevent data from being read by another transaction when the data is updated in a certain transaction, and to guarantee the consistency of data at the start of the certain transaction even if a lot of time is required for read processing. In many transaction processing systems, transactions are isolated from each other by locking. In this case, when data is updated, reading of (reference to) the data is waited.