1. Field of the Invention
The present invention relates to tracking changes to files in a file system.
2. Description of the Related Art
A file system is a set of computer programs that takes ownership of the storage space of a volume or hard disk and uses the storage space to store files, directories, and other file system objects. Files and disk volumes are storage objects in that data in a file or volume persist after a corresponding power supply is deactivated. However, files are more flexible than volumes, and files are often more convenient for storing application data than raw volumes or disks. A file can contain both user data and internal data, called metadata, which are used by the file system to manage the file.
A file system can access file system objects by making a system call through the operating system of the computer system on which the file system resides. Most file systems support a standard set of system calls, such as open, read, and write, that can be used to perform operations on a file.
A file system maintains at least one namespace for file system objects. A namespace is a set or group of names that is defined according to some naming convention.
A flat namespace uses a single, unique name for every device. For example, a small Windows (NetBIOS) network requires a different name to be assigned to each computer and printer. The Internet uses a hierarchical namespace that partitions the names into categories known as top level domains, such as .com and .net, which are at the top of the hierarchy.
Microsoft's NTFS is a recoverable journaling file system that guarantees the consistency of the volume by using standard transaction logging and recovery techniques. NTFS records changes to file system structures in a transaction log. In the event of a disk corruption, NTFS runs a recovery procedure that accesses information stored in the transaction log file. The NTFS recovery procedure guarantees that the volume is restored to a consistent state.
A Change Journal was added as a new feature of NTFS in Windows 2000. The Change Journal provides a persistent log of changes made to files on a volume. NTFS uses the Change Journal to track information about added, deleted, and modified files for each volume. The Change Journal describes the nature of changes to files on the volume. When any file or folder is created, modified, or deleted, NTFS adds a record to the Change Journal for that volume.
The Change Journal is more efficient than time stamps or file notifications for determining changes in a particular namespace. Applications that normally need to scan an entire volume to determine changes can scan the volume once and subsequently refer to the Change Journal. The input/output (I/O) cost depends on the number of files that have changed, not on the number of files that exist on the volume.
Each record in the Change Journal takes approximately 80-100 bytes of space, but a maximum size for the Change Journal can be configured so that the Change Journal does not exceed the maximum size. When the maximum size is reached, the oldest records in the Change Journal are discarded.
The full pathname of a file or directory is not stored in a Change Journal record. Instead, a File Reference Number (FRN) for the parent directory is stored. Each application using the Change Journal is expected to keep an internal database of all directories (and their FRNs) for the file system in order to look up a reference number for the parent directory. An application using the Change Journal initially builds a mapping of directories and file names to FRNs, a time-consuming operation. The application maintains the database with changes described in the Change Journal.
Each record in the Change Journal is identified using an Update Sequence Number (USN) that increases and provides a logical offset into the Change Journal file. Writes to the Change Journal occur in 4 KB blocks (according to a USN_PAGE_SIZE variable). Records cannot span a page boundary, so padding is used when the next record to be added requires more than the remaining space on the page. The Change Journal can be read from the first available record (StartUsn=0), an existing record (StartUsn=USN), or the next record that will be written to the Change Journal (StartUsn=NextUsn).
One Change Journal exists for each NTFS volume (i.e., for the entire file system). The NTFS Master File Table (MFT) marks an entry for a file or directory with the LastUSN used. The Change Journal is a sparse file so that the records at the beginning of the file can be deleted without a significant detrimental effect on performance.
The Change Journal can have one of the following states:                Disabled        Activating        Active        Disabling        
The Change Journal can be enabled (placed in the active state) or disabled at any time. The default state of the Change Journal is disabled. An application using the Change Journal can enable or disable the Change Journal according to that application's own needs. However, this feature can be problematic if one application disables the Change Journal when another application expects the Change Journal to be enabled. The current implementation of the Change Journal cannot be locked for exclusive use.
Because multiple applications can manipulate the Change Journal, each application using the Change Journal must be capable of handling a change in the Change Journal's state at any time, including the deletion of the Change Journal.
When the Change Journal is disabled, all records are purged to prevent applications from reading unreliable records, and the Change Journal file itself is deleted. Disabling (deleting) the Change Journal sets all LastUsns in the Master File Table to zero, which is a time-consuming operation.
The Master File Table can be read to show the LastUsn for files in a range that may have been purged from the Journal. Data in the MFT indicates to the application that something happened to these specific files, although the records for those changes are gone. Note that reading the MFT does not work for deleted files, as deleted files are no longer in the MFT. In addition, the MFT does not indicate intermediate changes made in the USN range.
While the Change Journal is an improvement over scanning an entire file system for changes, improvements can be made. For example, the Change Journal is a hidden system file that can be accessed only by using a file system-specific Application Program Interface (API). Each application using the Change Journal is pre-configured by adding the appropriate file system-specific API calls. Furthermore, each application using the Change Journal must keep track of File Reference Numbers for each file and directory in the file system so that a full pathname for the file can be specified to access the file or directory. These requirements are burdensome for application programs and may prevent access by remote applications running on different computer systems, also referred to as nodes in a distributed environment. Those nodes may be running different file systems and/or operating systems than the node maintaining the Change Journal. In addition, one application can disable or enable the Change Journal without regard to other applications' need to use the Change Journal.
A solution is needed to enable multiple applications to access a log of changes to files in a file system without the need to scan the entire file system repeatedly. Preferably, enablement and disablement of the log are not controlled by application programs, but by an independent process. In addition, the solution should be accessible from nodes running different file systems than the node maintaining the Change Journal. The solution should also be accessible by applications without using a file system-specific API. The solution should furthermore not burden applications with the task of maintaining their own copies of file system metadata, such as File Reference Numbers, to use the log. The solution should not significantly affect performance of the file system or the operating system.