1. Field of the Invention
This invention relates generally to the field of mass storage devices and more particularly to systems and methods for defragmenting mass storage devices.
2. Background
Advances in mass storage technology have considerably increased the amount of data that can be stored on individual random access magnetic disk devices used with computers. As users add files to a disk, and then update and change those files, a phenomenon known as fragmentation tends to occur in most disk storage systems, whether the disks are attached to mainframe computers, workstations, or personal computers.
When a file is first created, many operating systems or file systems will cause the file to be allocated to a contiguous area, such as series of tracks or cylinders on the disk in mainframe systems, or blocks in open systems, if it is possible to get a contiguous area. For illustration, as shown in the allocation map of FIG. 4a, if a simplified hypothetical disk has 80 tracks, the first file might be placed on tracks 0-30. However, if a second file is created on that disk it may be allocated at tracks 30-40. When the user wants to add data to the first file, most file systems and/or operating systems will accomplish this by allocating some additional space at another physical location on the disk for the addition, and updating pointers in the directory or volume table of contents on that disk to indicate that the first part of the file is still at tracks 0-30, but a second part has been added at tracks 40-50. When the second file needs to expand, it may be allocated space from tracks 50-80.
If the second file in this example is subsequently deleted, tracks 30-40 and 50-80 may be freed up, but the first file will still have two parts located at different sets of tracks--0-30 and 40-50. Thus, the first file has become and remains fragmented. (In this example, the second file was fragmented as well.)
Fragmentation tends to build up over time, as more files are added, deleted and modified on the mass storage system. Two problems occur as disks and files get more and more fragmented. The first, and usually more significant one, is that performance degrades when fragmented files on a fragmented disk are accessed. When a file is located in one contiguous area, it usually requires fewer seeks, or moving and repositioning of the read/write heads on the disk to find and read or write the data. As the file becomes fragmented and thus more spread out over the disk, more seeks will be needed to find a number of additional small sections or fragmented extents of the file. These additional seeks for many small fragments usually add to the time it takes to read or write or update the file. Now that disks can be not only many megabytes, but even gigabytes or greater in size, this can literally add minutes or sometimes hours to processing time, especially if all or a substantial portion of the file is being accessed.
A second problem arising from disk fragmentation is that space on the disk tends to be wasted. In the hypothetical example given above, after the second file has been deleted, there are two "holes" or empty areas free on the disk, from tracks 30-40 and from 50-80--a total of about 40 tracks. However, depending on the operating system and file system in use, a new file that needs 40 tracks might not be allocated to this disk, even though 40 tracks are free, simply because they are not contiguous. Many file systems will allow the user to indicate to which disk a file should be allocated--another way in which files may become fragmented when they otherwise might not if the user repeatedly selects an already fragmented disk.
One way to address the problem, at least in some filesystems, is to increase the initial allocation size for each file. For example, making all allocations at least 4 megabytes in size. Some filesystems will permit this, while others, such as some personal computer systems may make this difficult or impossible for a user to do.
Software programs known as defragmentation programs have also been written to address these problems. In the flow diagram of FIG. 5, a prior art approach with a defragmentation program operating in a host computer is shown. As noted there, a conventional defragmentation program reads the directories or volume table of contents on a disk (at step 510) in a mass storage system connected to that host to locate files that have significant fragmentation. Generally speaking, a defragmentation program finds or frees up enough space on the same disk for those files, and then writes them back out from the host computer's memory to a contiguous area on the selected disk(s). (See steps 520-560). Since each fragmented file is read from the disk into the host computer and then written back out to disk in the loop shown at steps 530 to 550, it can be seen that as the files and disks get larger, this may take longer and longer to accomplish. Also in FIG. 5, the darkened arrows I/O indicate the transfer of data between the host computer and a disk 15. As can be seen with this prior art approach, the loop at steps 530-550, can involve a significant number of I/O transfers between the host computer and the disk. Even in small personal computer systems, these I/O requests to read in and then rewrite all fragments may be slowed down by competing requests for other disks or devices on the same bus as the disk that is being defragmented.
Whether the defragmentation program handles only a file or even a fragment at a time, it will read the file or fragment into the host computer over a bus, then write it out from the host computer over a bus to the disk. Thus, although defragmentation programs help return mass storage systems to optimal performance, they may also create risk. If the system crashes while the program is reading and writing a file or fragment back out to disk, one or more files could be corrupted. Thus, the longer it takes for the defragmentation program to run, the greater is the risk that unexpected occurrences may cause a system failure or crash. As the size of the files being defragmented increases, more time is spent reading the files into the host computer and writing them back out to the disk. Thus, defragmenting larger files takes more elapsed time due to the number of I/O transfers to and from the host and may also increase the risk of file corruption in the event of a system failure. Many defragmentation programs attempt to minimize this risk by not allocating the new, defragmented file, until the whole operation is complete and error-free. While this tends to minimize the risk of corruption, if the system crashes before defragmentation is complete, minutes or hours of time could still be wasted by the conventional approach of defragmentation programs.
One attempt to avoid fragmentation is used by some database programs that allocate very large areas of contiguous space to a database file when the file is first created. While this tends to minimize the performance problem, it does waste space. And it is typically limited to sophisticated database application programs. Most other types of files, on most computer systems, whether large or small, tend to be subject to file fragmentation over time.
It is an object of this invention to minimize the need to transfer fragmented file data from a disk into a host computer and back to the disk again.
It is another object of the present invention to improve the performance of defragmentation operations.