Computer system administrators want to monitor the content that is stored on the computer systems for which they are responsible. This is necessary for various reasons, including understanding how existing storage is utilized, defining and implementing storage policies, and adjusting the storage policies according to usage growth.
Administrators use storage reports to help monitor storage on file servers and other servers, anticipate storage needs, analyze emergency situations and take preventive and/or corrective actions. For example, an administrator may want to see a sorted list of all files larger than one-hundred megabytes on a given namespace, sorted by size, and with summary information on totals. Another such report may provide summary information for each file type (e.g., “Media Files”) on a given namespace, including the one-hundred largest files within each file type category. Thus, storage reports help an administrator identify inefficient use of storage, implement mechanisms to prevent future misuse, and monitor usage patterns and utilization levels in general.
While storage reports provide valuable functionality, generating the storage reports has a number of challenges. In general, to generate a storage report requires a traversal of the file system mounted on a storage volume, which may be accomplished by an enumeration of the files, e.g., via a “find-first, find-next” traversal of the volume's directories. The end result may not be a complete traversal, however, because some files may be opened for exclusive access. Further, for typical types of volumes on which storage reports are run, (e.g., file servers), the amount of data is very large, and thus the scan takes a significant amount of time. Because of the scanning time, changes may be being made to files as the scan is occurring, and the amount and types of changes may be significant. For example, one file may appear twice if it is moved during the scan, while another file may not be found at all. Scanning by traversing the file-system metadata (for example, the Master File Table for the Microsoft® NTFS file system or some other database-like structure) is almost impossible because this metadata keep changing during the scan.
As a result, the storage report or reports may contain possibly significant inconsistencies and inaccuracies, which may mislead the administrator. Ordinarily, the greater the amount of live user activity, and/or the greater the amount of data being scanned, the greater the number of inconsistencies and inaccuracies will be, whereby the problems increase when multiple volumes are scanned as part of generating the report. Still further, the generation of the storage report can heavily burden a computer system's processing and I/O resources, whereby the system's performance may be degraded to an undesirable level.
What is needed is a better way to generate storage reports that provides an administrator with consistent and accurate information. The consistency and accuracy should be independent of the live activity, amount of data and/or number of volumes being scanned, and in general any adverse impact on the system's performance caused by the storage report generation should be able to be mitigated.