The main components of a general purpose, programmable, digital computer, as illustrated by a representative computer 10 shown of FIG. 1, are a central processing unit (CPU) or processor 12 for manipulating data according to a program of instructions, and a main or working memory 14 for temporarily storing data, including program instructions. (The term "data" will be used generically, unless the context otherwise indicates, to mean any information in digital form stored by a computer, including, without limitation, program instructions, text, business data and commands.)
Most computers also include one or more mass or high capacity data storage devices for long term storage of data in the form of files. There are many types of mass data storage devices available, including, for example, magnetic and optical tape, magnetic, magneto-optical and optical disks, and solid state (e.g. flash memory). Most, but not all, of these devices and media are non-volatile; i.e. they do not require power to maintain long-term memory. Many can be written to one or more times. In addition to the basic, physical properties of their respective storage media, they differ in manner (sequential versus random) of access, speed of access, cost per unit of data storage and storage capacity, among other characteristics. The type of device selected often depends on the requirements for the particular computer. Unlike the main memory, which stores data in directly accessible small units, i.e. bytes, mass data storage devices are set up to receive and make data available to the CPU in comparatively large blocks. Mass data storage devices are treated as peripheral input/output (I/O) devices, meaning that they and/or their controllers are set up (in hardware, software or both) transfer data in relatively large (e.g. 512 byte) blocks.
Most modern computers utilize at least one or more magnetic disk media for high-capacity storage, as such media currently offers a good combination of speed of access, capacity and cost. In the representative computer of FIG. 1 the mass data storage devices are a floppy disk drive 16 and its controller 18, and hard disk drive 20 and its controller 22. Conventionally, the floppy disk drive receives a removable flexible, magnetic disk 17. The hard disk drive includes a stack of spatially-separated, stiff, magnetic platters, which are usually fixed, but may also be made removable. The controllers for the respective disk drives translate basic commands received from the CPU into the appropriate actions for that particular disk drive, and control the flow of data to and from the disk drives. Computers will often include other types of mass data storage devices, such as CD-ROM drives and tape drives.
In addition to mass data storage devices, the computer 10 also includes various other peripheral components or I/O devices with which the CPU communicates, including, for example, a keyboard 24, a video monitor 26 and its graphics adapter 28. In the simplest form of the computer 10, the CPU, main memory, the mass data storage devices and the I/O devices communicate over a single system bus, which is designated 30 in the figure. However, most computers use more complex types of bus arrangements for enabling communication between the CPU, the main memory, and the various I/O devices.
There is also a separate, non-volatile, solid state read-only memory 32 for storing what is referred to as the "BIOS" or "Basic Input/Output System," which is permanently resident software, separate from an operating system. The BIOS software routines, when executed by the CPU, translate certain "calls" from an executing program wanting to access an I/O device, whether it be an operating system or an application program, into a sequence of commands that are provided, or stored in registers of, a particular I/O device or its controller for execution by the controller. By segregating hardware-dependent I/O device access routines from other programs running on the computer, higher level programs, such as operating systems and applications programs, need not be written for specific computer hardware, allowing at least some level of compatibility among different hardware systems. The BIOS also includes software for handling certain types of errors which occur with the I/O devices, as well as instructions for testing various components of the computer when it is powered up and loading an operating system from a disk drive.
Disk drives are physically addressed by the BIOS using a cylinder, head and sector number. A typical hard disk includes multiple platters rotating on a common axis. On each side of each platter are arranged concentric tracks. Tracks having the same diameter or radius lie within a "cylinder." Each side of each platter is read and written to by a separate a read/write head which moves across the tracks. The head and cylinder numbers uniquely identify a track, and the sector number uniquely identifies one of the sectors within a tract. The cylinder, head and sector ("C,H,S") address 0,0,1 is always occupied by a partition sector. Hard disks for may be used by a computer to store more than one operating system, which means that more than one type of file system may be used to store files on a hard disk. A hard disk is therefore partitionable into multiple "drives" or "volumes." The partition sector stores a table specifying the start and end of each partition, or a link to the next partition, as well as some other basic information about the disk. Most operating systems address sectors using a logical block address rather than the C, H, S address. A logical block address (LBA) is a sequential numbering of the sectors within a partition or drive. It is one of the BIOS' functions to translate between the LBA and a physical C,H,S address.
As previously mentioned, data is organized for storage into files. Depending on the size of the file and the size of the sectors in the storage device in which it is stored, files may be stored over one or more sectors of the storage device. It is the job of an operating system, particularly its file system, keep track of what files are stored, and where they are stored, in the storage device. Generally, this file information is also stored in the same device as the files. Some of the information is typically stored in designated sectors or areas set aside for that purpose.
Each operating system has a different file system. The File Allocation Table (FAT) file system is the native file system for IBM-standard personal computers running the MS-DOS.RTM., Windows.TM. 3.x and Windows.TM. 95 operating systems of Microsoft Corporation, and it is supported by Microsoft Corporation's Windows-NT. The FAT file system was originally developed for small capacity, floppy disks, but has been extended to be used for today's very large capacity disk drives. The FAT file system has several versions. The ones used by earlier versions of the MS-DOS and Windows 3.x operating systems are generally referred to as the FAT-12 and FAT-16 file systems. Microsoft WINDOWS 95 supports FAT-12, FAT-16 and a 32 bit version called FAT-32. Microsoft Corporation's Windows NT.TM. operating system utilizes a native file system known NTFS, or New Technology File System, and also supports the HPFS file system developed by IBM for the OS/2.RTM. operating system. These systems share, to varying degree, a similar approach to managing files on the disks.
FIGS. 2a and 2b illustrate, respectively, examples of how the FAT-16 and FAT-32 file systems organize data on a hard disk or other mass data storage device. Each has a partition sector 34a starting at C,H,S=0,0,1. (A floppy disk is generally not partitionable, and therefore has no partition sector.) Following each partition sector, there is a bootstrap sector, which starts at a fixed location (C,H,S=0,1,1) so that the BIOS always knows where to find it. In the FAT-16 system, there is a single boot sector 36. In the FAT-32 system, there are two, identical boot records, each labeled 38a, for redundancy. The boot records store basic information about the disk needed by the file system, as well as a program for loading the operating system from the disk.
The FAT file systems, like the file systems used by many operating systems, allocate clusters of sectors, rather than individual sectors, for file storage. The number of sectors per cluster within a partition is fixed during formatting of the storage device and stored in the boot record. The files are grouped into directories. The directors are organized hierarchically starting with a root directory 40a. The root directory in the FAT-16 system is in a fixed location in the storage device so that it can be found. However, the FAT-32 file system need not store the root directory at a fixed location. The remaining directories, which are set up by a user, are sub-directors of the root directory, can be located anywhere within the data area 44a, along with the files.
Each user directory includes an entry pointing to itself, identified by a "." and listing its starting cluster; an entry for its parent directory, if any, identified by a ".."' and listing its parent's starting cluster; an entry for each first-order sub-directory, which includes its name and starting cluster creation date/time (and a provision for long time names in Windows 95); and an entry for each file stored in that directory, which includes its name, length and starting cluster. Each directory is allocated at least one cluster for storing this information. Basically, each entry in a directory acts as a pointer to the starting cluster of a file, sub-directory or parent directory. If a file is allocated more than one cluster, the additional clusters must be chained together by the operating system by looking up a pointer to the next cluster in file allocation table (FAT) 46a for the partition.
The FAT 46a is stored between the boot records and the root directory. Because of its importance, there are two copies stored. It has one entry for each cluster in a partition. The entry will indicate that the cluster is available, being used or is bad. If it is one of the clusters in a chain of clusters making up a file, it will include the cluster number for the next in the chain or a special, predefined character or value for indicating that it is the last cluster in the chain. The FAT for the FAT-32 file system also stores the starting cluster for the root directory. In the FAT-16 table each entry is a 16 bit cluster address and in FAT-32 each entry is a 32 bit cluster address.
The storage devices illustrated by FIGS. 2a and 2b have been partitioned by second partition sectors 34b. The second partitions also include boot records, 36b and 38b, respectively, root directories 40b, data areas 44b and FAT tables 46b.
Referring now to FIG. 3a, in the New Technology File System ("NTFS"), a storage device or disk formatted for the NTFS includes a master partition sector 34a and a boot sector 36a at fixed, predetermined addresses. However, following the boot sector, there is allocated space for, in order, a master file table (MFT) 50a, a partial copy 52a of the MFT, and other NTFS metadata files 54a. The remaining unallocated area of sectors 56a, up to the extended partition sector 58, is used for storing user files and index buffers, which can be thought of as a form of a directory. Following the extended partition sector is another volume, or partition, with it own boot sector 36b, MFT 50b, partial MFT copy 52b and NFTS metadata files 54b for that volume, and an area of user files and directories 56b. The storage device may have, if desired, additional extended partitions, defining additional volumes. NTFS can be considered an extension of OS/2's HPFS and includes file security features.
All data stored on an NTFS volume is stored in a file, including the NTSF data structures used to locate and retrieve files, the bootstrap data file (which is stored in the predefined boot sector) and a bitmap file which records the allocation state of each cluster on the volume. NTFS data structures are referred to as metadata files, and also include a log file, volume file, attribute definition table, root directory and bad cluster file, among others.
Like the FAT file systems, space in the volume is allocated in clusters of sectors. The number of sectors in each cluster is fixed within a volume. The clusters are numbered sequentially from the beginning to the end of the volume. These numbers are called logical cluster numbers (LCN).
Each file, including the MFT, boot file and other metadata files, has an entry in the MFT. Each is treated as a "file," as are the user directories and files. Each "file" in the MFT is defined by a row of attributes. These attributes include things like the names of the file (more than one is possible), time stamps for creation and modification dates, its MS-DOS attributes and security descriptors. There is a "data" attribute for user files, which may be used to store actual file data for small user files. A file may have additional "named attributes." For a directory, the data area is used to store attributes for a sorted index pointing to the files that are grouped in the directory. The index for a file includes the files name and reference number, which is a pointer to the file's entry in the MFT.
The MFT is a fixed size. In the event a file attribute cannot fit within the entry area allocated to the file, the attribute is stored outside the MFT, in which case it is called a non-resident attribute. For a user file (as opposed to a directory), a group of consecutive clusters where a nonresident attribute is stored is called a data run. If an attribute's value is non-resident, its header, which always remains resident, indicates that is non-resident, and it is followed by a pointer to the LCNs of the clusters where the attribute is actually stored. This is accomplished by recording the starting virtual cluster number (which is a sequential numbering of clusters within the file) for each run, with the LCN where the run begins, as well as the number of clusters in the run. In essence, data runs are the same as the files in the FAT file systems, and the VCN to LCN mapping is similar to the starting cluster information for the file in the directory of a file system. However, unlike the FAT system, the number of clusters or length in a run is available in NTFS in the same entry as the starting cluster number, as is also the starting clusters and length of other runs storing the file. Thus, in NTFS, there is no need for a separate FAT to store cluster allocation information. A separate bitmap file, which is one of the NTFS metadata files, indicates whether a cluster is available for allocation or is already allocated within the cluster.
For a directory, a group of clusters storing non-resident file index information is called an index buffer. The index attribute in a directory's entry in the MFT includes an index root segment, an index allocation segment and a VCN allocation bitmap. The root index contains the next higher order file number for each index buffer. For example, if files 1, 2 and 3 are stored in a cluster run constituting a first index buffer for a directory, file 4 (or the next highest ordered file number in the directory) is stored as the root. For each root, there is a VCN-LCN mapping in the index allocation segment, which includes the starting VCN and LCN of the index buffer and the number of clusters in the buffer. The bitmap segment tracks which of the VCN's in the allocated cluster runs are free for storing additional indexes within the directory entry. The index buffers are thus, in many respects like a directory stored in the data user area in the FAT systems.
Because data for the file system for a particular storage device is stored on the device itself, a computer crash, hardware malfunction or programming glitch can destroy critical data necessary for retrieving files. Magnetic disk drives, in particular, are susceptible to data corruption, though it can happen to any media to which data is written, or on which it is stored. For example, file system information can be corrupted by damage to the disk media caused by physical shock or, in conventional hard disk drives, a crash of a read/write head should the disks suddenly stop rotating. A hardware malfunction in the device's controller, a bad memory chip or poorly written software can also corrupt file system data. A power outage can strike before caching software has written all of its cached data to a file on the device. Improper powering-down of a computer can leave critical file system information stored in memory, before it is written to the device.
The FAT file system is particularly at risk. Most file system corruption occurs near the beginning of a disk partition, where the most critical FAT file system information resides. There are computer viruses that specifically target a FAT partition table or boot record, which can wipe out critical information necessary to retrieve files. There is a "wrapped around" effect which may cause data in the FAT file system to be overwritten when a large capacity disk is replaced in an older computer that does not have an EIDE disk controller, and the LBA mode disk access or the large capacity disk software driver is improperly installed.
Furthermore, if information on the numbers of sectors per cluster (SPC) is lost, any file system information storing location of files and directors using clusters becomes useless. If information on where extend partitions exist is lost, all file system information for an entire partition is effectively lost.
Prior art utility programs for recovering files stored in FAT file systems, such as Norton Utilities, usually try to "fix" corrupted file system information by writing new file system data to the disk so that the operating system can then access the files from the device. However, these fixes are often ineffective, and may cause valuable data to be overwritten in the process.