1. Field of the Invention
The present invention relates to a file system characterized by a control structure such as a file allocation table and a directory tree and, more particularly, to a device and method for recovering data in a file system when the computer on which the file system is maintained is restarted due to errors. Specific examples presented herein apply particularly to file systems characterized by file allocation tables. However, persons of skill in the computer arts will understand that the present invention encompasses data recovery in a wide variety of modern computer systems and file systems used therein.
2. Description of the Related Art
A file allocation table (FAT) and associated directories are used for the convenience of users of a computer system, such as a communications switching system (i.e., a PBX or keyphone system), that employs a file management system to manage large-capacity files. When a fatal error occurs in the course of processing files, the computer system must be restarted by means of a software or hardware operation (by pressing a power-on switch twice or pressing a reset button, for example). A restart operation that occurs when the FAT or a directory is being changed may lead to an anomalous situation: the data processed before the restart operation remains available, but consistency is not assured for the data related to the FAT or the directory being changed.
When the computer system is restarted, the file system may fall out of consistency with respect to the storage state of management information maintained for controlling the overall file system. This inconsistency may result in the loss of file data and, most undesirably, in an interruption of services of the computer system. Such an interruption may amount to only an inconvenience in some situations involving only general personal computers. However, it presents a major problem for mission-critical systems and particularly for switching systems that are required to provide continuous service with high reliability. System reliability is substantially reduced if service may be suspended after a restart operation due to the loss of essential programs and data required for system operation.
The critical problem of recovering file system control structure data upon restarting the system has generated three general types of solutions. U.S. Pat. No. 5,561,795 provides an example of the first type, wherein the file system maintains an ongoing log of file system transactions. This approach has seen success in a variety of contexts, but it has the drawback that with large file systems a long and complex procedure may be required to reconstruct the file system control structure. An even more serious limitation is that transaction logging may not ensure recoverability of corrupted control structure data. Both of these features make transaction logging of only limited usefulness for data recovery in mission-critical systems such as switching systems.
A second approach to data recovery is exemplified by U.S. Pat. No. 5,504,883, entitled "METHOD AND APPARATUS FOR INSURING RECOVERY OF FILE CONTROL INFORMATION FOR SECONDARY STORAGE SYSTEMS" and issued Apr. 2, 1996 to Coverston et al., the disclosure of which is incorporated herein by reference. Here a pair of disk drives is used to back up control information from cache memory on a periodic basis. Each of the disk drives writes a time control stamp immediately before and after writing the current control structure to disk. If the system is restarted, the recovery system compares the four control stamps (one before and one after the copy of the control structure on each of the disks) to identify an intact copy of the control structure. Two storage devices are needed in this approach to ensure that an intact copy of the control structure remains in storage even while the file system is updating the other copy.
The elegant solution provided by the '883 patent has certain features that unfortunately limit its usefulness for mission-critical, high performance systems such as switching systems. First, it requires redundant disk storage devices that increase the overall cost of the computer system in which it is implemented. Such storage devices, even with modem designs, have relatively slow access times and require a separate controller to control the interface between the storage device and the rest of the computer system. More seriously, recovery of control structure data from backup files entails an inherent performance tradeoff: more frequent backups drive up the system's management overhead, and fewer backups risk catastrophic data loss. Such compromises are desirably avoided in application environments that require both high reliability and high performance.
The use of flash memories (or FEPROMs) has been suggested as a way to avoid the disadvantages of dual mass storage devices while retaining the benefits of redundant storage of critical data. For example, U.S. Pat. No. 5,432,927, entitled "FAIL-SAFE EEPROM BASED REWRITABLE BOOT SYSTEM," issued Jul. 11, 1995 to Grote et al., the disclosure of which is incorporated herein by reference, shows a boot sequence reprogramming system using dual flash memories. Flash memory is a recently-developed EEPROM technology suitable for an expanded range of applications because it allows numerous rewrites. The '927 patent shows a successful application for redundant storage of an effective boot sequence routine while an updated boot sequence routine is loaded.
On the other hand, writing to a flash memory still requires elevated voltages, as with traditional EEPROM devices, and the design complications that those elevated voltages entail. Moreover, U.S. Pat. No. 5,392,427, entitled "SYSTEM FOR UPDATING DATA STORED ON A FLASH-ERASABLE, PROGRAMMABLE, READ-ONLY MEMORY (FEPROM) BASED UPON PREDETERMINED BIT VALUE OF INDICTING POINTERS" and issued Feb. 21, 1995 to Barrett et al., the disclosure of which is incorporated herein by reference, illustrates some of the complications that arise when using flash memories for frequently-updated data storage. Most seriously, mere replacement of mass storage devices with flash memory devices will not avoid the overhead-data loss tradeoff problem inherent to redundancy systems.
A third, promising approach to data recovery was proposed in U.S. Pat. No. 4,164,017, issued Aug. 7, 1979 to Randell et al. The disclosed apparatus runs a program that is divided into program blocks. Data that will be changed by the execution of a given block is backed up in memory prior to execution of the block. The basic idea of process segmentation provides a potential alternative to data recovery through storage redundancy, but the '017 patent does not explain how that idea might be implemented to overcome the problems associated with reliable recovery of file allocation data. In particular, it does not show how to restore a file control structure to consistency when consistency has been lost due to the occurrence of a restart event while the file control structure was being modified.
The '017 patent also does not show how process segmentation might be used with advanced semiconductor devices to overcome the problems of control structure data recovery. Indeed, much of the currently available semiconductor technology, including flash memories and high-capacity (1 Mb or higher) SRAMs, did not exist when the '017 patent was granted. See generally Betty Prince, SEMICONDUCTOR MEMORIES: A HANDBOOK OF DESIGN, MANUFACTURE, AND APPLICATION (2d ed. 1991), the disclosure of which is incorporated herein by reference. Specific attention is directed to pages 537 and 398-90 of this reference.
An application of the idea presented in the '017 patent has been proposed in U.S. Pat. No. 5,564,011, entitled "SYSTEM AND METHOD FOR MAINTAINING FILE DATA ACCESS IN CASE OF DYNAMIC CRITICAL SECTOR FAILURE" and issued Oct. 8, 1996 to Yammine et al., the disclosure of which is incorporated herein by reference. This system protects against data loss from failure of certain critical disk sectors, which contain file management data, by storing enough information in main memory to allow at least partial recreation of the critical sector data. If the file system detects that a critical sector has failed, it can create at least a partial image of the sector in memory from data read at initialization and at updates.
The ability to recover the control structure state that existed just prior to the occurrence of an error would effectively address the overhead-data loss tradeoff problem of redundancy systems. But the system of the '011 patent unfortunately may only provide partial recovery of the data stored in an affected disk sector. Also, the disclosed system relies upon data stored in main memory to perform this recovery, and this data would be lost if power to the computer system failed. Redundancy systems at least provide the assurance that a usable data structure can be recovered. Data regeneration as provided by this patent therefore would not adequately address the data recovery needs that exist for file systems required to provide high performance and high reliability.
We have found, in fact, that a need exists for an efficient and reliable alternative to redundancy-based data recovery systems. Such an alternative would avoid the overhead and reliability drawbacks of redundancy systems. It should also reliably allow recovery of data after any of the full range of possible error conditions, including total interruptions of power to the computer system. Desirably, it would enable recovery of a faithful copy of the system's control structure as it existed just prior to the condition that necessitated restarting the system. Ideally, its data safekeeping operations would add only modestly to the operational overhead of the file system.