Disk drive based storage systems are widely used for storing digital data of all kinds, such as computer data and multimedia data (voice, video and image). In storage systems, one or more disk drives are configured to provide storage of data content. The disk drives can be read from or written to. Data can be stored and modified on the disk drives via the computer host bus controllers. Before data is stored onto storage systems, it must be organized in logic units, called data files, such as video files, image files, database files, etc. Such organization is typically handled by the operating systems of the computer system that hosts the data files.
In general disk storage system can not hold data without a file system, rather it holds physical arrays of unspecified data blocks. A file system is a data structure layer implemented as part of the operating system. It defines how the computer interfaces with the attached disk storage, be it directly attached or attached through a network interface cable. The file system defines how the data is organized and located on the disk drives, file ownership and quotas, date of creation and change, and any recovery information associated with the file. The file system is the critical link between the logic data files and the physical disk drive storage systems. It not only manages the data files but also maps the files to the disk drive storage system.
To write a file to the disk storage, a sufficient number of data sectors must first be allocated. This operation is performed by the file system and lower level device driver programs. Preferably all of the data of a file is stored on a single track and in consecutive data sectors. In this way, data is accessed continuously without moving the disk actuator.
However as host computers typically add and delete data files, causing allocated data blocks be freed and then allocated for a different file, possibly of different file size, the disk actuator must be moved from track to track. The times required for the movement of the disk actuator between two tracks on the disk drive platter, in order to gain access to data on different tracks, is called random seek latency. During this latency, no data access is possible, therefore, it is preferred to have a seek latency as short as possible and preferred to have the number of seek activities as low as possible.
As the file system stores and updates files on the disk drive, available contiguous data blocks become difficult to locate. If a file is larger than the available contiguous data blocks, the file is fragmented and stored wherever the available blocks can accommodate it. In order to read or write the entire file, the disk drive actuator is moved to the track containing the first file segment. When the data for the first file segment is completed, a seek operation is performed to get to the track holding the second file segment. The seek operation can take anywhere from 1 millisecond (mS) to 20 mS. A standard 3.5″ disk drive platter rotating at 7200 RPM will make one revolution in 139 microseconds (μS). During the shortest seek, the disk drive platter will complete 7.2 revolutions. The longest seek will allow the disk to complete 143 revolutions just to get to the track that contains the requested data. The average seek time on a 3.5″ disk drive is approximately 8 mS, meaning the disk drive platter will complete 57 revolutions, on average, every time the disk drive actuator is moved.
An example of a file fragmented into five segments, would require a seek operation between each block. In order to retrieve the data for a read of this file, a total of 228 revolutions of the disk drive platter are wasted with no data retrieved during the seek operations. This can devastate the performance of the disk drive storage system. The fragmentation of the files on the disk drive will impact the performance of the host computer as processes await the requested file.
As the host computer repeatedly performs the above file operations over long periods of time, and as the file system starts to get full, i.e., most of the data sectors on the disk drive platters will be allocated to stored files, allocating new data blocks becomes increasingly difficult. Specifically, a file system may need to allocate data sectors in physically scattered locations in order to store the data from a single file. Since the host computer accesses a file as a single data entity, the file system must gather all of the data blocks associated with a specific file when such a file access request is issued by the host computer. This requires that the disk drive actuator move to read data sectors from different locations on the disk drive platter, resulting in random seek latencies. The problem of a single data file being stored in scattered locations is called file system fragmentation. A fragmented file system causes an excessive amount of random seeks to access file space allocated for write and read.
For real-time applications, the storage I/O access throughput requirements, i.e., the transfer rate between the disk drive storage device and other devices connected to it, are usually high. The I/O throughput requirements are also getting higher as applications get more advanced. For example, a high-definition television signal, when in uncompressed format, can require as high as 186 megabytes per second (MBps) access throughput. For digital film production, the storage I/O throughput can be as high as 1.2 gigabyte per second (GBps). More importantly, the data transfer to the storage device must be performed isochronously, i.e., with timing constraints. This poses a substantial challenge to the disk drive technology.
Disk drives can deliver very high data throughput as long as random seeks are reduced to a minimum. With fragmented file systems, a substantial amount of random seeking and latencies may be introduced due to file fragmentation. This can significantly reduce the data throughput, often to a level below what is required to support real-time applications such as high-definition video and digital film production. In general, by storing all of the data blocks associated with a given file consecutively on a disk platter, fragmentation could be reduced or eliminated. This goal can not be achieved without taking up additional host computer resources. Specifically, as the file system gets full, or as the file system gets an increasing amount of file write and delete operations, the fragmentation will reach a level that the host computer typically will initiate a processing task called defragmentation.
In a defragmentation process, the data blocks are re-arranged on the disk drives so that the data blocks of the same file can be accessed with the fewest number of random seeks. The defragmentation program will move the data blocks of a given file to be on the same data track or in the same general region. However, the defragmentation process requires extra processing and data transfers. The host computer must decide, via the file system, which data is moved to which location to minimize resulting fragmentation. The objective is to reduce or eliminate the fragmentation of files. Many of today's operation systems have defragmentation programs, and some prefer to run the defragmentation in the background at a lower priority than application file accesses. During this process, some free data blocks may be used as a temporary hold place for fragmented data blocks. Sometimes, if the disk drive device is almost full with user data blocks, there may not be enough available free data blocks to hold the defragmented files. This means that certain fragmented data blocks may need to be copied multiple times within the disk drive to move them to the desired and defragmented locations. As a result, the overall data access throughput is further reduced due to the defragmentation process, defeating the original purpose which is to improve the storage system performance.
Thus, a need still remains for a system to manage the amount of fragmentation a file system is subjected to. In view of the throughput demand generated by new applications, it is increasingly critical that answers be found to these problems. Solutions to these problems have been long sought but prior developments have not taught or suggested any solutions and, thus, solutions to these problems have long eluded those skilled in the art.