A storage server is a computer system that is used to store and retrieve data on behalf of one or more clients on a network. A storage server typically stores and manages data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. In conventional network storage systems, the mass storage devices may be organized into one or more groups of drives (e.g., redundant array of inexpensive drives (RAID)).
A storage server may be configured to service file-level requests from clients, as in the case of file servers used in a Network Attached Storage (NAS) environment. Alternatively, a storage server may be configured to service block-level requests from clients, as done by storage servers used in a Storage Area Network (SAN) environment. Further, some storage servers are capable of servicing both file-level and block-level requests, as done by certain storage servers made by NetApp®, Inc. of Sunnyvale, Calif.
To facilitate data access, a file system can be implemented to logically organize the data stored in a storage server's underlying NAS or SAN environment. The contents of the file system are often indexed for searching or monitoring purposes. In order to provide a complete and up-to-date overview of the contents of the file system, a “crawling”, or scanning, of the file system is periodically performed by a crawler processor. During crawling, various information about the contents of the file system can be acquired. However, the speed of a file system crawler is often inadequate, especially when indexing a large-scale storage system which can contains millions of files with storage sizes in petabytes.
Further, the performance cost of periodically crawling an entire file system is often disproportionate to the rate of change of the file system. Even if there is only one file being updated after a prior crawl, the file system needs to be crawled completely in order to determine which file has changed. One approach to eliminate a complete re-crawling is to require all applications to notify the file system whenever a file is updated by the applications. However, on a large, enterprise file system, the overhead from keeping track of all the file-updating notifications can be overwhelming. Also, such approach requires modification of every file-updating application with notification functionalities. Another approach is to modify the file system to generate a log each time the file system detects a file changing event. The log is then leveraged by an indexing service to determine the files to be re-indexed. However, such approach may require updating of the file system software in order to accommodate such functionality. To further complicate matters, the above approaches can incur a significant amount of latency and therefore significantly reduce a crawler's performance.