The present invention relates to file systems. More particularly, the invention relates to a method for managing directories of a large-scale file system by applying a transformed Extendible hashing technique based a fixed length Extent-based allocation on a directory which is one of the important objects.
In general, a file system is a key part of the technologies for operating systems (OS) such as Windows (Microsoft), LINUX (Open Source) and UNIX. A sub-system, which allows the user's data to be stored in an easily understandable format, includes EXT2 (Extended file system 2) and GFS (Global File System).
All objects managed by the above file systems can be expressed as inode and exist in various forms including files, directories and links in the data storage structure of a device. The data storage structure can be classified as root blocks which have a global depth and leaf blocks which have a local depth. A directory of a data storage structure is an object which allows files to be stored systematically.
Also, the file system manages a directory by allowing an insertion of a directory into a block and allowing a searching operation through an application of the Extendible Hashing technique and the block mapping method which uses a hash function.
As shown in FIG. 1, the data storage structure of EXT2 which is a standard file system in LINUX environment manages a directory in the form of unsorted linear list if one block is completely filled with directory entries.
The following shows a more detailed explanation of the management method. In the data storage structure of a file system as disclosed, the structural information for internally expressing a directory which appears in an operating system as a root block and leaf block, more specifically, a plurality of directories which possess one directory entry are stored. Likewise, pluralities of directories which possess one directory entry are also stored in one of EXT2 blocks.
As shown in FIG. 1, a directory entry 10 comprises, inode11 which shows the same unique object ID as a block number at which it can be stored, rec_len12 which shows the total length of the entry itself, name-len13 which shows the length of the name of the directory, file_type14 which shows the information pertaining to the directory type, and name15 which shows the name of the real directory. For instance, if a directory such as “C:\WINDOWS” exists, then the name of the directory becomes “WINDOWS” and the length of the directory name become “7”.
The directory entry 10 is sequentially stored in one root block 20 continuously. As shown in FIG. 1, if one root block 20 is completely filled up, the information of the next block 21(i.e., block number) is stored in the header area of the previous block 20 using Linked List structure of EXT2 and the directory entries are sequentially assigned on a new block 21 continuously. As a reference, block 21 and 22 correspond to leaf blocks of root block 20, block 22 corresponds to the next block to block 21 and block 21 corresponds to the previous block to 22.
However, the sequential directory management method like EXT2 is inappropriate for a large-scale file system since the length of Linked List corresponding to a block increases with an exponential increase in the number of the directory entries. Especially, as the length of Linked List increases, the time taken to perform an insertion, search and deletion of directory entry increases accordingly.
As shown in FIG. 2, in case of Global File System (GFS) which was developed to resolve the problem associated with the sequential directory management method of EXT2, the Extendible hashing technique is used for managing the directory. The following shows a more detailed explanation of the management method based on data storage structure.
In GFS, the hash value field 16 which stores bit rows for applying the Extendible Hashing technique and extended directory entries 10a added to the conventional directory entry 10 as shown in FIG. 1, are sequentially stored in root block 30 continuously.
At this instance, a transformed value obtained by applying a certain hash function against the name of directory entries 10a is used for the hash value. If root block 30 is completely filled up by directory entries 10a, then each of directory entries 10a is separated and stored at leaf block 31 and 32.
In practice, if a global depth of root block 30 is assumed to be “1”, then first of all, an index value, which represents the location information for the number of bits to be referenced from the global depth of root block 20 in terms of the hash function, is calculated, i.e., 21=2(0 or 1).
Next, if one of the bit pointers for directory entries 10a in root block 30 indicates “0”, then corresponding directory entries 10a are stored in leaf block 31 on the upper part of FIG. 2. If one of the bit pointers for directory entries 10a in root block 30 indicates “1”, then corresponding directory entries 10a are stored in leaf block 32 on the lower part of FIG. 2.
For instance, if 10 directories of root block 30 exist under “C:\” directory with the name such as “C:\1, 2, 3, 4, 5, 6, 7, 8, 9, 10”, then using a modular calculation value such as “%2” is used in hash function the directories “2, 4, 6, 8, 10” are stored in block 31 and the directories “1, 3, 5, 7, 9” are stored in block 32. When a directory stored in this way is searched, the number of search amount is reduced to ½ in comparison to the conventional sequential directory management method.
Also, if directory entries 10a are separately stored in each of two leaf blocks 31 and 32, directory entries 10a of root block 30 are deleted and the block numbers for leaf block 31 and 32 are sequentially stored.
At this instance, if a situation, where all the block numbers are not able to be stored in block 30, occurs, indirect blocks 30a and 30b are created and the block numbers for the indirect blocks are stored in root block 30. Afterwards, the block numbers that indicate to leaf blocks 31, 32, 31a which are stored in the existing root blocks are stored in the newly assigned indirect blocks.
However, for the directory management method using the extended hashing technique of GFS, indirect blocks 30a and 30b with a full flat structure has to be created whenever a situation occurs where the block numbers for root lock 30 are unable to be stored. Also, since the block size for leaf blocks 31, 32, 31a are limited, the data storage structure consisted of the blocks on which the extended hashing technique is applied has to be frequently modified and extended.
The sequential directory management method like EXT2 or the directory management method using the Extendible hashing technique of GFS, uses a storage structure of block platform as shown in FIG. 4 that maps 1:1 for each of the area in root block 40 where only the block numbers of leaf blocks 41 and 42 can be stored.
However, when the above storage structure is implemented to a large-scale file system where insertion of directory entries are frequently occurs, the necessity of an extension of the storage structure also occurs frequently. As a result, the structure is transformed into a structure that stores the first block number of successive blocks which is sequentially assigned to store the block numbers at the storage space of the root blocks using the block assign technique of Extent-based allocation as shown in FIG. 5.
Although the block assign technique based on a standard Extent-based allocation as shown in FIG. 5, can store successive numbers of leaf blocks based on a number of variable Extent-based allocation in one address storage space of root block 40, it is inappropriate for extent assignment for an Extendible hashing, extent extension and reduction since the length has to be stored separately and the calculated result from an equation for variable extent search can not be used.