The present invention relates to a method of minimizing disk fragmentation in real-time.
Current computer systems typically include a hard disk or some other type of computer readable medium (hereinafter collectively referred to as a xe2x80x9cdiskxe2x80x9d). The storage space in a disk is segmented into fixed size data areas known as sectors. The file system of the computer system typically groups the disk sectors into fixed or variable size areas known as clusters. A cluster is a group of one or more disk sectors. Clusters of a disk on computers running Microsoft Windows 95(copyright) may be, for example, 8 or 16 kilobytes in size. Each cluster is assigned a unique identifier (hereinafter referred to as a xe2x80x9ccluster IDxe2x80x9d). The cluster IDs are typically assigned sequentially starting with the innermost cluster of a disk.
Each file and directory stored on the disk is contained in one or more clusters. Unallocated clusters represent the free space on the disk. Contiguous unallocated clusters represent a single block of free space (an xe2x80x9cavailable blockxe2x80x9d). Unallocated clusters which are surrounded by allocated clusters represent isolated blocks of free space on the disk.
The location of files stored on a disk are typically included in an allocation table or the like. An allocation table generally includes at least one file identifier and a corresponding list of the cluster IDs in which each file is stored. The operating system running on the computer is responsible for creating and updating the allocation table. For instance, when a file is created, the operating system updates the allocation table to specify the clusters which are allocated to that file. When a file is deleted, the allocation table is updated to indicate that the clusters previously allocated to that file are now available.
Operating systems, including Microsoft Windows 95(copyright), Windows 98(copyright), and Windows ME(copyright) (Millenium Edition), allocate space for new files beginning at the first available cluster, i.e., the cluster with the lowest cluster ID which is available. For files which are being extended and the last cluster allocated to the file has cluster ID N, space is allocated at the first available cluster after cluster ID N. While this method of allocating space for new files is fast, it often results in fragmented files and many small isolated blocks. This methodology does not attempt to ensure that the clusters to be allocated to a file are contiguous. However, if the total size of the disk is substantially larger than the files it contains, there is a large amount of free space on the disk, which is frequently consolidated into a single large block of free space.
Fragmentation occurs when a file is stored in noncontiguous clusters. In general, it is desirable to avoid fragmentation of a file. First, reading and writing fragmented files involves more discrete input/output operations than for similar contiguous (unfragmented) files. Second, an input/output operation to a contiguous file on a disk is usually much faster than any number of smaller discrete input/output operations to a fragmented file. Third, if the discontiguous clusters are physically far apart, the seek time to find each fragmented cluster may be substantial. Fourth, fragmented files limit the effectiveness of read ahead caching. Fifth, fragmentation often causes more fragmentation of the disk as new files are written.
While there are many defragmentation programs currently on the market, they only defragment the disk when the program is run and are not real-time. Since most users only sporadically run such defragmentation programs, there is a continuing need for real-time methods for minimizing fragmentation.
The present invention is a method of allocating clusters of a disk or other computer readable medium containing a plurality of clusters to minimize fragmentation.
The method includes the steps of (a) identifying at least one available block in the computer readable medium, each block including one or more contiguous available clusters; (b) determining the size, location, or both of at least one identified block; (c) receiving a request to allocate one or more clusters to a file; (d) determining a starting cluster location based on the size of at least one of the identified available blocks, the location of at least one of the identified available blocks, or both; and (e) allocating the clusters requested beginning at the starting cluster location. Generally, the starting cluster location is not determined from only the location of one available block. Steps (a)-(d) may be performed in any order and may even be performed concurrently. Steps (a) and (b) are preferably performed concurrently and more preferably before step (d). Steps (a) and (b) are also preferably performed before step (c). According to a preferred embodiment, step (d) includes (i) determining or estimating the size, type, name, location (if the file already exists), or combination thereof of the file; and (ii) determining a starting cluster location based on the estimated size, type, name, location, or combination thereof of the file and the size, location, or both of at least one block. The method can also include the step of(f) identifying the allocated clusters as being unavailable.
By determining or estimating the size of the file for which storage space is to be allocated, a block of contiguous clusters close in size to the file may be chosen. This results in minimal fragmentation of the file. In contrast, in the prior art method, the first available cluster with the lowest cluster ID would be chosen, even if the following cluster was not available.
The name and type of the file can be used to determine or estimate the longevity of the file. Short lived files can be placed in remote unused areas of the computer readable medium in order to minimize fragmentation.
Alternatively, an available block near a specific location on the computer readable medium may be chosen in step (d) in order to minimize the seek time required to retrieve information. For example, if a request to extend a preexisting file is received, a starting cluster location may be chosen which is physically closest to the clusters allocated to the preexisting file. Unlike the prior art method, which only searches for the first available cluster after the last cluster allocated to a file, the method of the present invention can determine the closest available cluster before or after the clusters allocated to the preexisting file. The starting cluster location may also be determined based on the closest available block which has a size greater than or equal to an estimated size of the file. If a new file is being created, a starting cluster location closest to the current physical position of the drive head can be chosen.
According to a preferred embodiment, the method includes the steps of (a) identifying at least one available block in the computer readable medium, each block including one or more contiguous available clusters; (b) determining the size, location, or both of at least one of the identified available blocks; (c) receiving a request to allocate one or more clusters to a file; (d) identifying the largest available block on the computer readable medium; (e) selecting a starting cluster location within the largest available block; and (f) allocating the clusters requested beginning at the starting cluster location.
According to another preferred embodiment, the method includes the steps of (a) identifying at least one available block in the computer readable medium, each block including one or more contiguous available clusters; (b) determining the size, location, or both of at least one of the identified available blocks; (c) receiving a request to allocate one or more clusters to a file; (d) determining the type, name, location, or combination thereof of the file; (e) determining a starting cluster location based on (i) the type, name, location, or combination thereof of the file and (ii) the size of at least one of the identified available blocks, the location of at least one of the identified available blocks, or both; and (f) allocating the clusters requested beginning at the starting cluster location.
According to yet another preferred embodiment, the method includes the steps of (a) identifying at least one available block in the computer readable medium, each block including one or more contiguous available clusters; (b) determining the size and location of each identified available block; (c) intercepting or receiving a request to allocate one or more clusters to a file; (d) determining or estimating the size of the file; (e) determining a starting cluster location based on the estimated size of the file and the size of at least one available block; and (f) allocating the clusters requested beginning at the starting cluster location. According to one embodiment, the size of the block represents the number of contiguous available clusters.
In one embodiment, the size of the file is estimated by assuming it is a predetermined size, such as 1 or 4 megabytes, or the sum of the predetermined size and the size of the space requested (in step (c)).
According to yet another preferred embodiment, the method includes the steps of (a) identifying at least one available block in the computer readable medium, each block including one or more contiguous available clusters; (b) determining the size, location, or both of at least one of the identified available blocks; (c) receiving a request to allocate one or more clusters to a file; (d) identifying a location on the computer readable medium; (e) determining a starting cluster location based on (i) the identified location and (ii) the location of at least one of the identified available blocks; and (f) allocating the clusters requested beginning at the starting cluster location.
According to yet another preferred embodiment, (a) identifying at least one available block in the computer readable medium, each block including one or more contiguous available clusters; (b) determining the size, location, or both of at least one of the identified available blocks; (c) intercepting a request to a Windows operating system routine to allocate one or more clusters to a file; (d) determining a starting cluster location based on the size of at least one of the identified available blocks, the location of at least one of the identified available blocks, or both; and (e) sending a request to the Windows operating system routine to allocate one or more clusters to the file beginning at the starting cluster location. The file system of the computer readable medium is preferably FAT or FAT32.
Yet another embodiment is a computer readable medium containing a software program that includes instructions for performing any of the methods of the present invention.
The invention also includes a computer system comprising (a) a computer readable medium having a plurality of clusters; and (b) a processor in communication with the computer readable medium. The processor is configured to perform any of the methods of the present invention.