1. Field of the Invention
This invention relates generally to methods for restoring data from damaged or corrupted computer files, specifically to a method for uniquely encoding database files, and using the encoding method to accomplish reliable error detection and restoration of the database files after they are corrupted.
2. Introduction
One of the primary uses for computers, whether for personal desktop computers or mainframe computers, concerns the processing of database information. A database is a collection of information arranged in an organized, easier-to-find manner. A typical database might include financial and accounting information, demographics and market survey data, bibliographic or archival data, personnel and organizational information, public governmental records, private business or customer data such as addresses and phone numbers, etc.
Database information is usually contained in computer files arranged in a pre-selected database format, and the data contents within them are maintained for convenient access on magnetic media, both for storage and for updating the file contents as needed.
A conventional computer database, of the dBase type, comprises a data record file having a plurality of fixed byte-length records; and an optional file, for storing larger variable byte-length data, commonly known as the memo file. Both types of files typically include a file header with file structure information, and a data region that contains the useful data. The record file header may contain file structural information such as byte-length of a record, position of the first record in a file, and number of records in a file. The memo file header typically contains the location of the next available memo position. Variation might exist, to include for example information within the header indicating other formatting details or other characteristics of the file contents. Corruption of header errors is not difficult to detect or repair. A more difficult task involves detecting errors and restoring the contents of data regions in either record or memo files, when such files become corrupted or damaged.
In a record file, the data region consists structurally of one or more equal length records positioned sequentially one after the other at fixed interval positions in the file structure. Each record is subdivided into one or more fields, with each field containing a single type of data such as binary, alphanumeric or other type of data. Each record has the same number of fields and same field types. One or more of the fields may also be memo fields storing memo pointers. Memo pointers are numerical values which indicate the positions of "memos", the variable byte-size data units in the memo file. Instead of individual byte-count memo size variations, sometimes memos are made up of larger sized entities called "memo blocks", each block of memo data having a suitably convenient predetermined byte size, e.g. 512 bytes. In older database memo files, such as Borland dBaseIII, memos were only constructed to store text information. However, newer memo files, such as Microsoft FoxPro memo files, can also store graphics, multimedia, or any other digital or binary-coded information. For purposes of this invention, no distinction is made between these different memo file data unit types, and they are simply referred to as memos.
A database file may be corrupted (damaged) for a number of reasons, including application program bugs, the crashing of the database application program, the crashing of the operating system, and the rebooting of the computer when the program is running, among other reasons. Corruption may occur in the header and/or the data region. Corruption in the data region may include offsets that separate adjacent valid records or memos by extraneous bytes; incorrect memo pointers that reference wrong memos; illegal memo pointers that reference non-existent memos; and cross-linked memo pointers that reference the same memo. Offsets disrupt the structural positions of records or memos, i.e., they cause the records or memos to be arranged at incorrect byte-count positions within their respective files.
The description of prior art and preferred embodiment focuses on dBase type databases having separate record and memo files. However, concepts discussed can be applied to database files having records, memos and other information combined in a single file, such as Microsoft Access and similar database files.
3. Description of Prior Art:
A variety of approaches have been suggested for users of computer database programs, as suitable means for ensuring data integrity and restoring corrupted database files. Such error detection and restoration methods found in the prior art can be classified into three categories.
The first category is a simple periodic backup, in which duplicates of the files are made on either the same storage device, such as a hard disk, or a separate storage device, such as a backup diskette, tape cassette or other removable media. Errors are detected in such a system only when the system fails or the errors are grossly apparent, while small cumulative errors in data may go undetected for extended periods of time. Furthermore, data entered or generated since the last backup cannot be recovered after database becomes corrupted. Also, many users fail to make periodic backups.
The second category provides automatic duplication of data or transaction loging on the same or another storage device. Examples of such and similar systems are disclosed in U.S. Pat. Nos. 5,280,611; 5,404,502; and 5,404,508. When data becomes corrupted the same or secondary device still has the uncorrupted data available. However, such real-time recovery systems are only adaptable to large system installations, requiring expensive hardware and software support often beyond the budget of individuals and small businesses.
The third category includes file repair utilities (FRU's) known to the prior art which can provide some amount of protection from file damage and corruption, but are nonetheless limited in many respects. A first type of FRU, such as "Norton Utilities FileFix" published by Symantec, has limited capability since it repairs only header damage. It cannot detect errors in the data region, and cannot effect necessary repairs to prevent file damage in the data region. Furthermore, it must be activated by a user, and requires the user to stop using the application program while the FRU is in operation. A second type of FRU, such as "dSalvage", distributed by Hallogram Publishing, can be used to repair both header and data region damage, but requires manual user intervention and considerable user skills in its operation and furthermore has limited error detection capability. A third type of FRU, such as "FoxFix", developed by XiTechnics Corporation, is substantially automatic, and requires no user interaction. It is fast and efficient, but it can detect and repair only very limited data region corruption problems. For example, it cannot reliably detect or repair offset record damage, nor resolve incorrect, illegal or cross-linked memo pointers.
Error detection methods incorporated in prior art FRU's have attempted to ensure file integrity of database files by one of three types of approaches, each of them having serious limitations: The first type of approach involves detection of record offset errors in the record file. There is no reliable way to scan through the record file and determine that each record is where it is supposed to be. An example method attempts to scan the first byte of each proper record position for the traditional delete flag which can contain either a space, `` character, or a `*` deleted flag character. Anything else indicates a record error. However, normal data in the rest of the record may contain `*` characters and spaces are frequent. Hence detecting those two characters in first byte is not an indication of correct record positions.
The second type of approach detects offset errors in the memo file. Both traditional dBase files and newer file variations lack unique characters to indicate the starting position of a memo in a memo file. One exception is FoxPro file memos, which use a 0001 (hex) and 0002 (hex) byte value sequence at the start of memos to distinguish text memos from graphical memo types. However, this is not a reliable means for locating memos, since 0 byte values are very common in memo files and can be followed by a random 1 or 2 byte value.
A third type is detection of memo pointer errors. The only two such errors that can be detected by prior art FRU's are illegal pointers, those which point beyond the end of the file, and, in some cases, invalid pointers, those which do not point to a memo start position or where plurality of pointers point to the same memo. The plurality of memo pointers pointing to same memo are sometimes referred to as cross-linked pointers. There is no prior art found in any FRU software providing a method to determine which of the cross-linked memo pointers is correct, nor a method how to correct the other memos. Furthermore, the available prior art cannot in fact determine that a pointer is correct. A pointer can be the only pointer to reference a valid memo position, yet it may reference an incorrect memo.
Reliable error detection is important not only in cases when data corruption become obvious to the program user, but also in cases when no errors are apparent. Business and government operations with accumulating undetected errors can have serious consequences.