The present invention relates generally to computer file systems and their organization and more particularly to file systems for disk data storage.
A file system is a hierarchical structure (file tree) of files and directories. File systems are utilized in order to maintain, manage and organize the large amounts of data ordinarily stored in a computer system, either on a physical disk drive, a volatile or non-volatile memory, or any other such storage medium. Depending on the selected storage medium, file system maintenance, management and organization can become very important in the efficient transfer of data.
For example, in disk drive systems, conventional file systems typically store information as blocks of data which are units of storage allocation. A block of data typically has a size corresponding to some integer power of two storage sectors. Thus, conventional file systems which employ a block file structure, vary in the amount of data that can be stored in a single block. For example, the UNIX System V file system and most MS-DOS floppy disk file systems are capable of storing blocks of data in block sizes of 29 sectors, or 512 bytes. File systems such as the SVR3 (an extension of the Unix System V file system) and CP/M (Control Program for Microprocessors, developed by Digital Research Corporation) were capable of storing blocks of data in 210 sectors, or 1,024 bytes. Similarly, the MS-DOS floppy disk file system can also store blocks of data in this block size. SVR3 also provides an alternate option for storing data in block sizes of 211 sectors, or 2,048 bytes. Moreover, some BSD (Berkeley Software Distribution, a version of the Unix operating system developed at and distributed at the University of California at Berkeley) systems and some MS-DOS hard disk file systems provide for data storage in block sizes of 212 sectors, or 4,096 bytes, while most modern UNIX file systems utilize block storage sizes of 213 sectors, or 8,192 bytes. Some MS-DOS systems are capable of incorporating even larger block sizes.
Unfortunately, due to the physical constraints of the disk drive (i.e., track length, density, etc.) inefficiencies in data transfer generally result once a significant amount-of data is located on the disk drive. The size of a data block has a number of implications on transfer and storage efficiency of a disk drive. Specifically, a large data block suggests that more disk space will be wasted since multiple files are not normally written to a block, even if space is available. For example, a small 100 byte data file will consume one entire data block, regardless of whether the block size is 512 bytes or 32 K bytes. Although only a portion of the data block includes the 100 byte file, the remaining space in the data block is unusable and therefore wasted.
In addition, a large data block suggests that more bits are transferred to and from the disk during a single read/write operation. Generally, the time required to perform a read or a write operation involves several parameters: (1) the service time for an operation in the disk queue to be performed, (2) the disk latency time, and (3) the data transfer time for the required data to be transferred (typically, doubling the transfer unit size doubles the data transfer time).
In conventional multi-tasking systems it is common for the disk drive to be busy while the computer is performing an operation other than a read or a write operation. Generally, a disk drive can be expected to perform one operation per every few disk rotations. Thus, given a large operation demand, the waiting time for an operation in the disk queue to be performed can be predicted as the product of the service time and the queue length. As such, delays of 100 milliseconds are not uncommon on conventional systems during operation demand peaks. These delays can become significant and can increase the time for a disk drive to perform an operation, such as a read or a write operation.
In addition, due to the mechanical nature of a disk drive, disk latency delays are typically associated with the time for the disk drive read/write head to be positioned to the proper cylinder of the disk and the time for the desired data on that cylinder to rotate under the disk drive read/write head. Conventional disk drives also typically have latency delays for selecting a particular track of the disk within the cylinder. For example, most disk drives have a rotation time of about 16.7 milliseconds (3,600 RPM) and seek at speeds greater than 13 milliseconds. Some current disk drives often rotate at 11, 8.3 or 5.5 milliseconds (5,400, 7,200, or 10,800 RPM, respectively) and seek at about the same speed. That is, disk drives can be viewed as rotating faster than they seek. Long seek delays (latency) can thus affect the transfer efficiency of the disk. It becomes extremely difficult to achieve an average of approximately one disk operation per rotation. Thus, minimizing seek delays, such as by optimally placing data on the disk to reduce seek distance and to sort outstanding disk drive read/write requests in the queue to minimize the seek time between requests, becomes important.
The complexity of metadata (the information on the disk about the files stored on it) for tracking the information about the data on the disk is also affected by the size of the data block. A file system contains information that is required to allow data to be accessed. This information takes the form of both user-visible and user-invisible data structures, such as the various directories which provide a mapping between file names and file numbers. There are also additional characteristics to maintain, such as disk drive free space, which data blocks are within which files, and the order of the data blocks within the files, along with any data gaps within the files. Most file systems also track information concerning creation, access and/or modification times for each file along with security and permission information of some type.
Large data blocks also suggest that fewer data blocks are available for a given amount of storage and fewer data blocks available per file on average. This allows the metadata structures of the files to be smaller and less sophisticated. The MS-DOS FAT (file allocation table) is a classic example of a simple metadata structure that does not scale well to a large storage architectures. While it supports up to 64 K blocks per file system, a 2 gigabyte file system utilizes blocks of 32 K each. This characteristic results in a large amount of wasted disk space and a degradation in transfer efficiency of the disk.
Physical disk geometry is also affected by the data block size. Traditional disk drives have a known number of sectors per track, a known number of heads per cylinder and a known number of cylinders per device. While these parameters may vary from device to device, these three values are generally sufficient for the system software to select the placement of on-disk data structures so that disk latency is minimized.
For example, many file systems, such as UFS (Unix File System), take this disk geometry into account at a very basic level. UFS is designed around the concept of a xe2x80x9ccylinder groupxe2x80x9d which is simply a collection of adjacent cylinders on the disk drive. This cylinder group contains its own allocation metadata and is managed by the system as autonomously as possible. That is, files tend to be allocated within the same cylinder group as their parent directory in an attempt to minimize disk latency.
Unfortunately, disk seeking typically involves a non-linear seek process. Disk head movement is generally described by the equation delay=(number of tracks to be moved)*(per track delay)+(settling time). To help reduce disk latency, microcontrollers are often utilized to accelerate disk head movement during the beginning of a seek and decelerate disk head movement toward the end of the seek. This provides dramatically faster performance for longer seeks and renders the above equation incomplete.
Two common methods to rate the seek time of a disk involve measuring either: (1) the average of all seek times from all cylinders to all other cylinders; or (2) the time to seek ⅓ of the distance across the disk. Both of these ratings are heavily influenced by long seek delays. Therefore, a technique that has the effect of reducing the maximum seek times by at least 10% would be extremely valuable.
To help address minimizing seek times of the disk, over the past few years, disk densities have been improved by placing more sectors on tracks at the outside of a disk than at the inside of the disk. This technique is referred to as xe2x80x9cmulti-zoningxe2x80x9d and is common in modern SCSI (Small Computer System Interface) and IDE (Integrated Drive Electronics) disk drives. Even CD-ROMS utilize a similar technique, referred to as xe2x80x9cconstant radial velocityxe2x80x9d that uses a spiral pattern for storing data instead of the track and sector architecture generally utilized by hard disks. Unfortunately, the spiral pattern makes random access of the CD-ROM difficult. Moreover, multi-zoning does not address the long seek times of the disk in reading/writing data at the inside of the disk. As such, multi-zoning does not adequately address disk latency and transfer efficiency.
Finally, on disk data compression is another aspect of a disk that is affected by the file block size. Relatively few file systems have attempted to use compression techniques to increase the effective storage of a disk. The introduction of compression into the file system adds complexity, increased memory demands on the system, and CPU burden which can be prohibitive to computational resources. Furthermore, most compression algorithms used for disk storage are adaptive which results in improved compression ratios for larger units of data. For example, LZW (e.g., GIF compression) and LZ77 (a dictionary-based compression method) style algorithms can actually expand small blocks of data storage. Furthermore, once the data is compressed, the resulting bits are no longer the same size as the disk sectors and it is difficult to predict how many disk sectors will be required to store a given uncompressed data block once it is compressed.
Thus, an optimal data transfer scheme is needed to help enhance the transfer efficiency of a disk. In addition to enhancing data transfer efficiency of the disk, a system for providing enhanced data recovery across a disk array is needed. Furthermore, enhancing the storage density and increasing the bandwidth efficiency of the disk is also needed. It is to these ends that the present invention is directed.
In accordance with the invention, a system and method for further enhancing the transfer efficiency of the disk is provided. In an aspect of the invention, a hierarchical file structure for storing a portion of data in a data file is provided. The file structure includes a granule storage unit that is configured to store individual bits of data, a variable sized component data unit that is configured to group a sequence of granules as a related data unit, and an extent transfer unit that is configured to group at least one component data unit as a data file. The granule storage unit may include at least one header granule and at least one data granule. The header granule includes metadata information relating to the component data unit.
In another aspect of the invention, the granule storage unit may include a second header granule. The second header granule includes supplemental metadata information relating to the component data unit.
Additionally, in accordance with the invention, in another aspect of the invention, a system for maintaining metadata information about data stored on a disk is provided. The system includes a plurality of data files stored on the disk, each data file including a hierarchical data structure for storing a portion of data, such as that described above. In addition, a like plurality of meta-component data units are provided, a respective meta-component data unit allocated for each data file. Each meta-component data unit includes a first data structure for maintaining operating system file information and a second data structure for maintaining a component data table.
In yet another aspect of the invention, a method for organizing hierarchical data structures on a disk by a file system is provided. The method comprises the steps of designating each extent transfer unit within the file system with an extent number so that a particular extent can be located by the file system, designating each component data unit within the extent with a file number identifying the location of the component within the extent so that a particular component can be located by the file system, allocating a disk address to each component within the extent so that the data in the extent can be accessed in a proper order, storing each component within the extent at the allocated disk address, and periodically relocating each component within the extent to a different location of the disk so that efficiency of the disk is maintained.
Accordingly, in an aspect of the invention, frequently accessed components are relocated toward the inner portion of the disk. Furthermore, components are stored on the disk in an uncompressed format and are compressed in accordance with a specific compression ratio as they become dormant.
Advantageously, in an aspect of the invention, the compression ratio of the component can be improved by iteratively processing the component with different compression algorithms and comparing each compression ratio result with a currently applied compression ratio to determine whether the compression ratio result is superior to the currently applied compression ratio. If the compression ratio result is superior to the currently applied compression ratio, the data in the component is processed by the compression algorithm associated with the superior comperssion ratio. Furthermore, this compression processing may occur during CPU idle time when additional CPU processing cycles are free to procession the different compression algorithms.
In yet another aspect of the invention, a system for organizing data files across a redundant array of independent disks is provided. The system includes an array of independent disks configured to store a plurality of data files thereon. A plurality of data files are stored on the disk array, each data file including a hierarchical data structure for storing a portion of data, as described above. A like plurality of meta-component data units are also stored on the disk array, a respective meta-component data unit allocated for each data file as described above. The plurality of data files may be stored on the disk array such that at least a portion of some of the data files are stored on multiple disks in the array.
In an aspect of the invention, a parity region is provided in the extent transfer unit for maintaining parity information about the data file so that the data file can be maintained across the entire disk array such that complete data recovery can be accomplished in the event of a failure of one of the disks in the array in accordance with the parity information.
Moreover, for providing improved disk efficiency and data management of data files across an ultra-wide redundant array of independent disks, a second plurality of meta-component data units may be stored on the disk array, a respective second meta-component data unit allocated for each data file. The second meta-component data units may maintain secondary parity information about each of the associated data files, such as by including an orthogonal parity scheme such that a lost data file can be recovered in the event of a secondary failure of another disk in the array.
Finally, in still another aspect of the invention, a method for enhancing the data transfer efficiency of a disk is provided. The method comprises the steps of assigning a predetermined fraction of a data track of the disk as a data transfer unit. Thus, in response to a disk operation to read or write data from an indicated data track of the disk, moving a data transfer read/write head of the disk to the indicated data track and carrying out the disk operation accordingly, such that the disk operation is completed after a fraction of a single rotation of the disk substantially equal to the data transfer unit. Accordingly, a next disk operation to be performed at a next indicated data track of the disk is determined and within the remaining fraction of the single rotation of the disk, the data transfer read/write head is moved to the next indicated data track of the disk. The next disk operation to read or write data from the next indicated data track is thus carried out such that at least one disk operation is performed per a single rotation of the disk.