A common problem with management of information on permanent storage devices is lost directories. Most computer systems structure information contained on a permanent storage device of the computer system in a hierarchical fashion. This hierarchy ("the directory tree") starts with the root directory and the root directory further contains subdirectories and files. When a directory contains a subdirectory, the directory is known as the parent directory and the subdirectory is known as a child directory. There may be many levels of subdirectories emanating from the root directory. A user utilizes subdirectories to organize files and other subdirectories. A lost directory is a directory that for some reason cannot be accessed either directly or indirectly through the root directory. The problem of lost directories typically occurs as a result of the corruption of the directory in which the lost directory is referenced. This corruption can occur from either a hardware failure or from a computer program that executes in an undesirable fashion. For example, a computer program executing in an undesirable fashion may overwrite a portion of the root directory which may render a subdirectory referenced from the root directory "lost."
Another common problem with management of information is cross-linked clusters. In some computer systems, files are allocated on a cluster-by-cluster basis. A cluster is a unit of storage on the permanent storage device and is defined in terms of sectors. A sector is a physical division of the permanent storage device and is the smallest unit of access to the permanent storage device. In computer systems using the MS-DOS.RTM. operating system developed by the Microsoft Corporation of Redmond, Wash., a sector typically contains 512 bytes of information. One cluster typically is equivalent to four sectors. Since files are stored in terms of clusters, when a file contains more information than can be stored in one cluster, the file is then stored in a chain of clusters. The problem of cross-linked clusters refers to when, due to a hardware or software failure, the cluster chains for two files have at least one cluster in common. Thus, when a user modifies or deletes one of the cross-linked files, the other cross-linked file is also changed without the user knowing.
FIG. 1 depicts a typical prior art layout for a permanent storage device used by MS-DOS. The permanent storage device 102 contains a file allocation table (FAT) 104, a root directory 106, and data space 108. The FAT 104 is a table that references each cluster in each file stored in the data space 108. Thus, the FAT contains information about each cluster on the permanent storage device, and this information logically links clusters into a chain of clusters for each file. The root directory 106 is the highest directory in the directory tree on the permanent storage device 102. As such, the root directory 106 provides access to every subdirectory or file contained on the permanent storage device 102. The data space 108 is the area of the permanent storage device 102 in which all of the clusters that store subdirectories or files are contained.
FIG. 2 depicts a more detailed diagram of a prior art root directory 106. The structure of the root directory 106 is typical of all directories on the permanent storage device 102. The root directory 106 contains many entries, each of which either refers to a file or a directory. Each entry has the following fields: a file name 202, an extension 204, an attribute 206, reserved space 208, a time of update 210, a date of update 212, a beginning cluster 214, and a file size 216. The file name field 202 contains the actual file or directory name corresponding to the entry. The file name field 202 can be up to eight bytes long. The extension field 204 contains the file name extension or directory name extension for the entry. The extension field 204 is 3 bytes long. An example of an extension to a file name is "EXE" when the entry refers to an executable file. The attribute field 206 contains the attribute for the directory or file corresponding to the entry. The attribute of a file indicates both the type of file or directory, represented by the entry and the accessibility to the file or directory. For instance, the entry may refer to a subdirectory or system file. In addition, the accessibility of the file or directory may be read only or hidden. The reserved field 208 is 10 bytes long and is a field reserved for future use. The time of update field 210 is 2 bytes long and indicates the time that the file or directory was last modified. The date of update field 212 is 2 bytes long and indicates the date on which the last modification to the file or directory occurred. The beginning cluster field 214 is 2 bytes long and contains the cluster number of the first cluster used in storing the information contained in a file. The file size field 216 is a 4 byte field which contains the length of the file in terms of bytes. One entry contained in all directories except the root directory is the ".." entry which refers to the parent directory.
FIG. 3 depicts a more detailed diagram of a prior art FAT 104. The FAT 104 contains an entry for each cluster in the data space 108. The FAT 104 is responsible for maintaining a logical link between each cluster used in storing information for a file. The entries in the FAT 302, 304, 306, 308 can contain a value of zero if the cluster is not allocated to a file, a valid cluster number if the cluster is allocated to a file or an end-of-file marker if the cluster indicated by the entry is the last cluster in a chain of clusters for a given file. When an entry in the FAT 104 contains a valid cluster number, the cluster number contained in the entry is that of the next cluster used in storing information for the given file. Thus, the FAT 104 links all clusters for a given file by having the entry for each cluster refer to the next cluster used for the file, with the entry for the last cluster containing an end of file marker.
FIG. 4 depicts a flowchart of the typical steps used in the prior art to access information contained in a file. In order to access information, a calling computer program passes the file name of a desired file and a logical location of the requested information (in reference to the beginning of the file) to the computer program responsible for file access, usually the operating system. In step 402, the operating system searches the root directory or various subdirectories for the file name of the desired file. In step 404, the operating system examines the associated entry for the file name and obtains the cluster number of the first cluster which stores information for the file. In step 406, the operating system accesses the FAT with the first cluster number. The accessed entry in the FAT corresponding to the first cluster number contains either an end-of-file marker if the file is only one cluster in length or, otherwise, contains a valid cluster number which is the next cluster used in storing information for the file. The operating system chains through the entries in the FAT until reaching the FAT entry corresponding to the cluster that contains the requested information. In step 408, after chaining through the FAT entries to obtain the appropriate cluster number for the requested information, the operating system accesses the cluster in data space 108 to obtain the requested information. In step 410, the operating system returns the requested information to the calling computer program and the calling computer program can then use the returned information for the processing of the calling program.
FIGS. 5A and 5B depict a flowchart of the steps performed by a prior art system (chkdsk) for repairing permanent storage devices. The chkdsk system is sold as part of the MS-DOS operating system by the Microsoft Corporation of Redmond, Wash. A user can run the chkdsk system either periodically for preventive maintenance or when the user notices a loss of a significant amount of information. That is, the user can no longer utilize a directory. The chkdsk system searches the directory tree starting with the root directory and finds chains of clusters that are marked as "in use" by the FAT, but are inaccessible through the directory tree. (A cluster is "in use" if the corresponding FAT entry contains a non-zero value.) Such chains of clusters are considered "lost." After finding a lost chain of clusters, the chkdsk system enters the lost chain as a file entry in the root directory.
In step 502, the chkdsk system first walks the directory tree and marks 1 bit in a bitmap for each cluster on the permanent storage device encountered. Each bit in the bitmap corresponds to one cluster in the data space on the permanent storage device. Walking the directory tree will be discussed in more detail below. In steps 506 through 516, the chkdsk system selects each entry in the FAT and determines whether each entry corresponds to a lost cluster. After steps 506 through 516 are completed, the bitmap will only have bits set for clusters that were marked as in use by the FAT, but were inaccessible from the directory tree. In step 506, the chkdsk system selects the next entry in the FAT starting with the first. In step 508, the chkdsk system determines if the selected entry in the FAT is in use. If the selected entry in the FAT is not in use, processing continues to step 516 wherein the chkdsk system determines if there are more entries in the FAT. In step 510, if the selected entry in the FAT is in use, the chkdsk system toggles the corresponding bit in the bitmap. In step 516, the chkdsk system determines if there are more entries in the FAT to be examined. If there are more entries in the FAT to be examined, processing continues to step 506 and the next entry in the FAT is selected. However, if there are no more entries in the FAT, processing continues to step 518 in FIG. 5B.
In steps 518 through 522, the chkdsk system determines which bits in the bitmap correspond to the heads of lost cluster chains. In step 518, the chkdsk system selects the next set bit in the bitmap, starting with the first bit. In step 520, the chkdsk system uses the FAT to determine all subsequent clusters in the cluster chain indicated by the set bit and unmarks the corresponding bits to the subsequent clusters from the bitmap. Step 520 is performed so that after each set bit in the bitmap is processed only the head cluster in each chain of lost clusters is left. In step 522, the chkdsk system determines if there are more set bits in the bitmap. If more bits are set in the bitmap, processing continues to step 518 wherein the next set bit in the bitmap is selected. However, if no more bits are set in the bitmap, processing continues to step 524. In step 524, for each set bit in the bitmap, the chkdsk system enters the corresponding cluster as a file entry in the root directory. Thus, the chkdsk system enters all lost chains of clusters, whether the chain was a file or a directory, as a file entry in the root directory. Since MS-DOS does not allow for converting files to directories, when utilizing the chkdsk system, lost directories are unrecoverable.
FIG. 6 depicts a flowchart of the steps performed by the prior art walk directory tree routine. The walk directory tree routine performs a depth first, left-to-right traversal of the directory tree. For each cluster encountered during the traversal, the walk directory tree routine marks a bit in a bitmap; the bit marked corresponding to the cluster encountered. The bitmap contains 1 bit for each cluster on the permanent storage device and is initially set to zero. Thus, when the walk directory tree routine has completed, the bitmap contains bits set for each cluster encountered during the traversal of the directory tree. The walk directory tree routine receives a directory as input. The walk directory tree routine is first invoked with the root directory. Although the walk directory tree routine is described as being a recursive algorithm, one skilled in the art will appreciate that the walk directory tree routine could be implemented using other methods such as an iterative method. In step 602, the walk directory tree routine selects the next entry in the directory received as a parameter ("the parameter directory") starting with the first entry. In step 604, the walk directory tree routine accesses each cluster for the selected entry using the FAT while marking a corresponding bit in a bitmap for each cluster accessed. In step 606, the walk directory tree routine determines if the selected entry is a directory entry. The walk directory tree routine determines if the selected entry is a directory entry by examining the attribute field in the selected entry. In step 608, if the selected entry is a directory entry, the walk directory tree routine makes a recursive call with the selected entry. In step 610, the walk directory tree routine determines if there are more entries in the parameter directory. If there are more entries in the parameter directory, processing continues to step 602 wherein the next entry in the parameter directory is selected. However, if no more entries are contained within the parameter directory, the walk directory tree routine returns to higher level processing.
In regard to the problem of cross-linked clusters, there are several methods for fixing cross-linked clusters which are unsatisfactory for the reasons described below. A first method for repairing cross-linked clusters is to manually copy the cross-linked files from the original permanent storage device to a second permanent storage device, delete the cross-linked files from the original permanent storage device and then copy the cross-linked files back to the original permanent storage device. Although this method does fix cross-linked clusters, this method requires a computer user to identify the files that are cross-linked and to manually make copies. This method is time consuming and difficult to perform for novice computer users.
The second method for fixing cross-linked clusters is a computer program that identifies cross-linked clusters and randomly truncates one of the links to the cross-linked cluster. Usually when a cross-link occurs, one of the files is correctly connected to the cross-linked cluster and the other file is incorrectly connected to the cross-linked cluster due to a hardware or software failure. Thus, randomly disconnecting one file from the cross-linked cluster may disconnect the correctly connected file and render both files unusable.