1. Field of the Invention
This invention relates to indirect addressing of files in a file system. More particularly, this invention relates to modifying the indirect addressing with additional controlled information.
2. Description of the Related Art
To a user or an application program, a file appears as a contiguous region of disk space addressed as bytes 0 through the size of the file minus one. In reality, such a file is stored as various physical blocks of data scattered throughout the disk. Accordingly, some address translation method is required to convert, or translate, the file offsets provided by the application to physical addresses in the data storage device.
A very common translation method uses a tree of fixed-sized indirect address blocks. Exemplary of this translation method is the Unix File System (UFS). An indirect address block is a block of metadata containing an array of block pointers. These block pointers point to other lower-level indirect address blocks. At the lowest level of metadata, the indirect address blocks point to fixed-sized blocks of data. The control block for the file which in Unix is the I-node, points to the top level indirect address block.
The translation method begins by using the most significant bits of the offset as an index to the root indirect address block. At the root indirect address block, the pointer to the second-level indirect address block is retrieved. Then, the next significant bits of the offset are used as the index in the second level indirect address block to fetch a pointer to the third-level indirect address block. Again, the next significant bits of the offset are used to find the pointer in third level indirect address block. That pointer may point to yet another indirect address block, or may point to the data block. It is the least-significant set of offset bits that are used as the byte index with the pointer from the last indirect address block to find the data in the data block.
The tree of indirect address blocks culminating in a data block, has a fixed depth. For example, to support a 32-bit (4 gigabyte) file system with four kilobyte indirect address blocks and four kilobyte data blocks (a traditional UNIX File System), requires two levels of indirect address blocks. Each level uses ten bits (1024 four byte block pointers), and the data block uses twelve bits. For a file system with 63-bit files using eight kilobyte indirect address blocks with an eight kilobyte data block, there would be five levels of indirect addressing, each level would have ten bits and the data block has thirteen bits.
The problems with this indirect addressing approach are that it uses a very large amount of disk space and memory for metadata, and it takes a significant amount of time to process the metadata in order to get the file. For example, associated with every data block (usually eight kilobytes in Unix) is a eight-byte entry in the lowest level indirect address block. A one terabyte file requires slightly more than one gigabyte of indirect address block storage. This is a very large amount of metadata to manage. It turns out that much of this metadata indirect addressing is wasted space because many of the files stored in the large file systems are contiguous files.
The indirect addressing file system assumes that file data blocks are scattered throughout a file system or throughout at least a large number of small contiguous regions. However, many application programs for efficiency and speed of operation will make an effort to store required files in contiguous space in the file system. As a result, it often happens that a block of indirect addresses may have a sequential arrangement of pointers. A sequential arrangement of pointers meaning that each of the pointers in the indirect address block points to the next adjacent data block . One solution in such a situation is to make the data blocks larger in size. However, if the data blocks grow larger in size, and the files become small, then a large amount of storage space is wasted because the data blocks are not filled by the file. Even if the file is large, if the file size is randomly distributed, then on average, half of the last block is left unused.
One attempt, to solve the problem of trading off large blocks of metadata versus wasted data block space in a file system, makes use of "extents." In an "extents" file system, each file is defined by a list of physical addresses with a length for each block at each physical address. For example, a file with noncontiguous blocks might be defined by the following list of physical address length pairs: First entry, 1000,50; the second entry 4000,800; third and last entry 3500,20. In the "extents" file system, this file first has 50 blocks of data beginning at physical address 1000; i.e., physical blocks 1000 through 1049. The file continues with 800 blocks of data starting at physical address 4000; i.e., 4000-4799. Finally, the file is completed with 20 blocks beginning at physical address 3500, i.e., blocks 3500-3519.
A number of problems exist with the "extents" type of file system. For example if the file requires large blocks of contiguous space, and the only spaces available are many small blocks in a noncontiguous locations, then the "extents" file system will have a very long list of "physical address, length" pairs to specify a file. A second problem with the "extents" file type system is that to retrieve data blocks within a file, specified by an offset, the software must search through the "extents" list to find the location of the block containing the data blocks sought after. Again, if the list is long, then the search process to find the correct "physical address, length" pair can be a time consuming process. Finally, in an "extents" file system if file storage is being rearranged to improve contiguousness, the "extents" list must be completely collapsed and a new "extents" list built. This requires a large amount of copying.
What is needed is an indirect address arrangement with modifications to reduce search time through the indirect address and the size of the indirect address metadata.