1. Field of the Invention
The present invention generally relates to data processing systems. More particularly, the present invention pertains to a file system of a computer operating system.
2. Description of the Related Art
A typical data processing system comprises a central processing unit, memory, and various peripheral devices such as mass storage units. A user's data is usually stored in long-term non-volatile mass storage devices such as hard disks using magnetic or optical media. One of the main purposes of many data processing systems such as personal computers is to create, manipulate, store, and retrieve data. An operating system, or a file system in particular, provides the machinery to support these tasks. File systems of modern computers, such as HFS or HFS+ of Apple Macintosh® operating system, are integral parts of all operating systems and provide a way to organize, store, retrieve, describe, and manage information on a permanent or semi-permanent storage medium.
The unit of storage in modern block devices such as a hard disk is a so-called block. For example, storage areas of modern hard disks are typically divided into tracks and sectors, which form blocks, or physical blocks. A physical block is an area in the storage medium that can be read and saved as a unit, and it provides the smallest unit that can be manipulated by the storage device. The typical block size of many modern hard disks is 512 bytes.
Many file systems also manage data on block storage media using blocks, or logical blocks. A logical block is often mapped to one or more physical blocks on a hard disk and it is the same size as, or integer multiples of, the disk block size. Data of a file is generally stored in one or more logical blocks. File systems also store data about a file, or its metadata, in one or more logical blocks. For example, many file system implementations use special or regular blocks on a storage media to store files' metadata. In most modern file systems, a data structure generally called an inode is used to store a file's metadata. For instance, a file's creation time, last access time, and permission settings, and the like, are typically stored in the inode associated with the file. An inode often occupies one logical block, but it can occupy two or more blocks if the inode grows beyond one logical block size.
It is a well-recognized fact in the art that accessing devices like hard disks is orders of magnitude slower than other operations in typical data processing systems. For example, central processing units typically found in personal computers have a clock frequency of 1 GHz or higher, translating into more than one instruction being performed per each few nanoseconds, whereas typical seek times of data stored in hard disks is of the order of 10 milliseconds. The bandwidth of the internal bus, for example, associated with the system memory, is of the order of hundreds of megabytes to gigabytes per second, or often much higher, whereas the typical value of the bandwidth of IDE or SCSI hard drives is around 20˜50 megabytes per second. The primary reason for this discrepancy in speed is that hard disks have mechanical parts. That is, to access data, the disk needs to be spun, and the heads need to be moved to access the target blocks. The seek time to move the disk heads from one part of a disk to another is considerable. Any operation that requires mechanical movement is usually much slower than those that require only electrical switching.
For this reason, accessing data that requires less amount of mechanical motion in a hard disk provides much faster access and much higher bandwidth. For example, reading or writing contiguous blocks from a disk is much faster than having to seek to access different blocks spread over different areas of the disk. Likewise, accessing blocks from the same cylinder group is substantially faster than otherwise because reading successive blocks in a cylinder group only involves switching heads. Switching disk heads is an electrical operation and thus significantly faster than a mechanical operation such as moving the heads.
Modern hard disks hide much of their physical geometries and their internal operations, and much of the low-level optimization is done at the drive controller level. File systems rely on the drive controllers for many tasks. In many file systems, the block storage device is abstracted into an array of (logical) blocks. File systems then manage the block array, and the device controllers do the actual working including mapping of the logical blocks into the corresponding physical blocks. In this disclosure, we will often use this level of abstraction for the sake of clarity. However, as will be apparent to people of ordinary skill in the art, the present invention can be understood, and practiced, at many different levels.
As an illustration, a schematic drawing of a logical structure of a storage medium such as a hard disk or a compact disc (CD) is shown in FIG. 1A. The medium 102 is divided into multiple blocks, which are not explicitly shown in the figure. Each block in this example can be viewed either as a logical allocation block or as a set of contiguous such blocks, possibly representing a file or a directory content or metadata. Certain regions of the medium in the figure are marked with hashed rectangles, 104-112. These rectangular regions represent blocks storing files or directories, or their metadata, i.e. inodes. The figure shows five such regions labeled from A to E.
In order to illustrate the file access times and their dependence on the file arrangement on the storage medium, two exemplary file access scenarios are shown in FIGS. 1B and 1C. In this example, it is assumed, for the purposes of illustration, that the file access or seek time is simply proportional to the sum of the distance between the locations in the block array of any consecutively accessed files. In the scenario of FIG. 1B, files are accessed “sequentially” based on the arrangement of the files on the medium. Their total access or seek time equals to 10.0, in an arbitrary unit.
FIG. 1C shows another exemplary access pattern and the corresponding seek times. Note that the files are accessed in a more or less random order in this scenario. More specifically, the access order is A, E, B, D, and C. The total seek time in this scenario is 32.0, much larger than 10.0 in the case of FIG. 1B. This example illustrated in FIG. 1 hence demonstrates the effects of file and directory access patterns, and their arrangement in a storage medium, to the access or seek times of the needed data.
In the prior art, files and directories are stored without regard to this consideration. As an example, FIG. 2 show typical mappings between a file or directory hierarchy and the corresponding file arrangement in a storage medium, or in a logical block array. The top portion of FIG. 2A illustrates an exemplary directory structure in a hierarchical file system, such as HFS of the Apple Macintosh® operating system. The tree 132 in the figure has nine nodes, labeled A through I. Four of them, A, B, C, and F, represent directories, whereas the rest leaf nodes represent files. The drawing at the bottom of the figure illustrates a logical representation of the block array 134 in a file system for the nodes in tree 132. The files and directories shown in the file tree are arranged in a particular order in this block array. For simplicity, we assume that they are inode blocks of the corresponding files and directories and that each inode occupies one block 136. The array also shows a region of empty blocks 138. The particular ordering shown in the figure is based on depth-first arrangement, and it is not in any way optimized in the sense illustrated with regards to FIG. 1. Note that the particular arrangement shown in FIG. 2A might be viewed as an example after installation of a new operating system on a new computer or the installation of new software (e.g. a new Web browser) on an existing, already used computers.
Once the system is used, however, the file arrangement changes on the storage medium. For example, some existing files and directories may be deleted and some new files and directories may be added. Furthermore, certain files and directories may be moved to different locations. FIG. 2B shows an exemplary directory structure, based on that of FIG. 2A, after some time of use. As is apparent from this pair of figures, FIGS. 2A and 2B, certain files have been deleted and some new files have been added during this intervening time period. More specifically, directories D and F and files H and I, from the tree 132 of FIG. 2A, have been deleted, and new directories J and K and new files L and M have been added. The new tree structure 162 reflects these changes. The bottom drawing in FIG. 2B illustrates a logical representation of this new block array 164 which reflects the way the data is stored on a physical medium. The updated list of files and directories from the file hierarchy 162 is shown in this block array representation, with the same labels. Note that, in this case, the occupied blocks 166 are fragmented and spread over the empty block regions 168. Therefore, at least for the reasons given with respect to FIG. 1, a typical access or seek time in FIG. 2B will usually be much larger than that of the block array shown in FIG. 2A for a typical file access pattern. This often translates into slower application launch, and longer response time in terms of user interaction. Degradation of performance after some time of use is typical in the implementations of the prior art. It should be noted that file data or file metadata may not be physically removed when the corresponding file is deleted from the file hierarchy. In some implementations of file systems, “deleted” files and directories may remain on the storage medium and may be made simply inaccessible.
There has been much effort to reduce file access times from block storage media. There are, for example, prior art applications that “defragment” file allocations in the storage medium, which are widely available in some of the popular platforms such as the NTFS file system of Microsoft Windows operating system. However, they are limited to defragmenting file contents stored in multiple regions: That is, defragmentation in the prior art attempts to gather the blocks storing the content of a single file to a contiguous single region.
In some cases, file contents are cached or pre-loaded into memory to speed up the application launch. However, this type of implementation does not directly address the issue of block arrangement in storage media and its effect on the application or file access times.