1. The Field of the Invention
The present invention relates to identifying files that are to be associated with corresponding file operations. More specifically, the present invention relates to systems, methods, and computer program products for selecting file identifiers that have an increased likelihood of being able to appropriately identify a file during a forward pass though a log.
2. Background and Relevant Art
Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of operations (e.g., database management, scheduling, and word processing) that prior to the advent of the computer system were performed manually. Many operations performed at a computer system include the manipulation of files, such as, for example, creating files, deleting files, modifying the contents of files, and modifying the attributes files (hereinafter collectively referred to as “file operations”).
Some computer systems are communicatively coupled to a number of different storage locations where files (as well as other data) can persist even when power to the computer system is removed. For example, a computer can be communicatively coupled to a number of magnetic hard disks or a number of logically distinct portions of the same magnetic hard disk (e.g., a number of different volumes or partitions). A computer system coupled to a number of storage locations may perform file operations that result in creating, deleting, or modifying files at each of the number of storage locations. When a system fault occurs (e.g., a power surge or component malfunction), recent changes in one file space (e.g., the effects of file creations, file deletions, and file modifications) may be lost while recent changes in other file spaces are unaffected. For example, a malfunctioning disk head used to access a first magnetic hard disk would typically not affect files at other magnetic hard disks.
To protect against loss of data, a computer system can maintain a log of the file operations that are performed on files at a storage location. When a computer system is communicatively coupled to a number of storage locations, the computer system can maintain a log for each storage location. In some cases, a computer system will maintain a log of file operations that are performed on some files but not other files or will maintain separate logs for different groups of files stored at the same storage location.
A log can be stored at a storage location where the files affected by file operations recorded in the log are also stored. For example, files and a corresponding log may be stored on the same disk partition. On the other hand, a log can also be stored at storage location that is different than the storage location where the files affected by file operations recorded in the log are stored. For example, files may be stored at a first magnetic hard disk and a corresponding log stored at a second magnetic hard disk.
When a request to perform a file operation is received, data associated with the request can be loaded into system memory included in a computer system. System memory is typically Random Access Memory (“RAM”), the contents of which are lost if power to the RAM is removed. If a file operation is to be logged, some portion of the loaded data is eventually flushed to a log entry (hereinafter referred to as “log data”) and some portion of the loaded data is eventually flushed to a file (hereinafter referred to as “file data”). Log data flushed to a log entry is typically sufficient to recreate the file operation indicated in the log entry. For example, a log entry for a file creation operation typically includes sufficient log data to recreate the file creation operation. In many logging mechanisms, log data is flushed to log entries before corresponding file data is flushed to files. Thus, there is a reduced chance that data could be lost due to system fault or power failure occurring after file data was flushed but before corresponding log data was flushed. That is, there is a reduced chance that a file operation could be performed without a corresponding log entry, which includes sufficient log data to recreate the file operation, being recorded to the log.
If at some time a storage location transitions into an inconsistent state as a result of a system fault or power failure (e.g., data in memory is not flushed to persistent storage), entries from the log can be processed to transition the storage location out of the inconsistent state. An inconsistent state can result, for example, when a transaction is open at the time of a system fault or power failure. To restore integrity, log entries between a more recent point in time and a more distant point in time can be processed to rollback any uncommitted transactions.
In some cases, log entries between a more distant point in time and a more recent point in time are processed sequentially (hereinafter referred to as “rolling forward”) to transition a storage location out of an inconsistent state. For example, during a roll-forward recovery, log entries can be processed sequentially to recreate and reapply previously performed file operations. However, to appropriately recreate and reapply file operations, some mechanism must be provided to identify files that are associated with the file operations indicated in log entries.
One mechanism for identifying a file during a roll-forward recovery is to include the filename of a file in a log entry for a corresponding file operation. For example, if a file creation operation created a file with a filename of “resume.txt,” the filename “resume.txt”could be included in a log entry for the file creation operation. Then, if the log entry was subsequently processed to recreate the file creation operation, a file could again be created with the filename “resume.txt.” However, identifying files by including filenames in log entries can lead to processing inefficiencies.
For example, a first file may be created with a first file name, a substantial amount of data added to body of the first file and, the first file subsequently renamed to a second file name. Log entries containing sufficient log data to recreate each of these three file operations (i.e., file creation operation, file content modification operation, and file attribute modification operation) may be sequentially written to a log.
Unfortunately during a roll-forward recovery, there are limited mechanisms for determining if the first file is intact at the time the log entry for the file creation operation is processed. When the log entry for the file creation operation is accessed, a check may be performed to determine if a file with the first file name exists. If there is no file with the first file name, the accessed log entry can be processed to create a new file with the first file name.
Since the first file has been renamed (and thus a file with the first file name does not exist), the check will determine that the log entry should be processed to create a second file with the first file name. Further, corresponding checks may determine (since a file with the first file name now exists) that the file content modification operation and file attribute modification operation should also be processed. However, processing each of the three log entries is redundant because whether or not the log entries are processed the resulting state of the storage location is essentially the same. That is, while the first file exists, there would be little, if any, benefit, to creating a second file with the first file name, adding the substantial amount of data to the second file, and then renaming the second file to the second file name. When the second file is renamed to the second file name, it may be that the data contained in the first file is overwritten with substantially the same data. In some cases, it may also be determined that renaming the second file is unnecessary due to the presence of the first file, and the second file is therefore deleted. However, in either case redundant data is written to the second file leading to processing inefficiencies.
Another mechanism for identifying a file during a roll-forward recovery is to include the file system identifier of the file in a log entry for a corresponding file operation. Unique file system identifiers are typically assigned to files by an operating system at the time the files are created. In some instances, an entry (including a file system identifier) for a created file is inserted in a file table and the index of that entry can then be used to identify the file for as long as the file exists. Users and application programs are typically prevented from modifying file system identifiers and file tables. As a result, there is a reduced chance that a file system identifier assigned to a file could be changed after the file is created.
Typically, separate file tables are maintained for each partition. Thus, as files are created and deleted on different partitions, entries can be inserted into and removed from the corresponding file tables. In logging mechanisms that flush log data before flushing file data this can be problematic. For example, when a file creation operation is to be performed, a first file system identifier can be allocated for assignment to the created file and can be loaded into system memory along with other log data and file data. Log data, including the first file system identifier, is then flushed from system memory to a first log entry.
However, before file data is flushed from system memory, and the first file system identifier actually assigned to a created file, a system fault affecting the storage location (e.g., a disk drive partition) where the file was to be created may occur. This can be problematic as most operating systems assign file system identifiers in a non-deterministic manner. That is, there is no guarantee that the first file system identifier flushed to the first log entry will in fact be available when the first log entry is subsequently accessed. Since the file creation operation was not in fact performed, the allocated first file system identifier can be released and subsequently non-deterministically assigned to another file. During a rollback, the first log entry can be processed to attempt to reverse any inconsistencies resulting from the system fault. This can include reversing inconsistencies resulting form the failed file creation operation.
However, it may be difficult to appropriately identify inconsistencies as the first file system identifier in the first log entry may now be assigned to another file. Further, even if the first file system identifier is available there is no guarantee that an operating system will appropriately re-assign the first file system identifier. Consequently, errors can occur when attempting to reverse inconsistencies resulting from the failed file creation operation. Further, this can potentially cause a rollback to abort. Thus, the affected storage location may not be able to be transitioned out of an inconsistent state.
Therefore, systems, methods, and computer program products, for appropriately identifying a file during a pass through a log would be advantageous.