This application relates in general to computer data storage and more specifically to systems and methods for ascertaining or documenting changes made to a file system.
For the purposes of the present discussion, a file system may be any organization of files and accompanying data. An example file system represents a special-purpose database for the storage, organization, manipulation, and retrieval of data. A file system may also refer to the software and/or methods used to organize and/or maintain the files in accordance with predetermine rules. Changes to a file system may include changes to data within a file, changes to metadata associated with the file, changes in file locations (e.g., path changes) within the file system, changes to folder content and location, changes in user access rights or other security information associated with a given file, folder, or associated directory structure, and so on.
Systems for documenting file system changes are employed in various demanding applications, including Secure Enterprise Search (SES), disc-space utilization software, Web-searching applications, software for repairing broken hyperlinks in large websites with multiple pages and hyperlinks, and so on. Such applications demand versatile systems and methods for quickly and accurately ascertaining changes made to a file system.
Systems and methods for quickly and accurately ascertaining changes made to a file system are particularly important in data-search applications, such as Secure Enterprise Search (SES), where accompanying search indexes must be periodically updated with file system changes to enable accurate search results.
In an example SES application, file system documents, such as Hypertext Markup Language (HTML) web pages, are indexed to facilitate searches. The SES application may employ the index to facilitate rapid searches of file system documents for desired content.
To enable accurate searches, the search index is periodically updated to reflect file system changes. For example, the search index is updated when documents and the content therein are changed, deleted, added, renamed, and/or moved; when document access rights change, and so on.
Conventionally, a process called crawling is used to analyze files in a file system and then update the search index accordingly. To reduce the time required to update the search index, crawling software may first implement a file system scan to determine what files and folders have changed since the last crawling operation. Subsequent crawling, called incremental crawling, is then limited to only components of the file system that have changed.
Crawling software may be implemented, for example, via a Windows New Technology File System (NTFS) connector. The NTFS connector may be implemented via Oracle SES. The connector collects content and Access Control List (ACL) data associated with all files and folders in an accompanying NTFS file system. Each file and folder in the NTFS file system is associated with a LastModifiedDate attribute, which is updated when a file changes but not when user access rights thereto change. To ascertain file and folder changes, including changes to user access rights for particular files or folders, the connector fetches the LastModifiedDate attributes and the Access Control List (ACL) for each file and folder in the file system. Unfortunately, fetching the LastModifiedDate attributes and the ACL in large enterprise applications is often undesirably time consuming, resulting in lengthy incremental crawling operations.
In general, conventional methods for ascertaining file system changes since the last crawling operation are undesirably slow. In an enterprise file system with terabytes of data, a given crawling operation may take weeks, depending on available computing resources. This may be particularly problematic in situations where substantial file system changes have occurred before crawling is complete.