A network storage server is a processing system that is used to store and retrieve data on behalf of one or more hosts on a network. A storage server is a storage controller that operates on behalf of one or more clients to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks or tapes. Some storage servers are designed to service file-level requests from clients, as is commonly the case with file servers used in a network attached storage (NAS) environment. Other storage servers are designed to service block-level requests from clients, as with storage servers used in a storage area network (SAN) environment or virtual tape (VTL) environment. Still other storage servers are capable of servicing both file-level requests and block-level requests, as is the case with certain storage servers made by NetApp, Inc. of Sunnyvale, Calif.
The storage server may incorporate various features, such as generation of certain kinds of storage images. Image generation may, for example, include mirroring, in which a mirror copy of certain data at one location is maintained at another location, snapshots, and/or clones of storage volumes. Mirroring of data may be done for various purposes. For instance, mirroring provides a mechanism for ensuring data availability and minimizing down time, and may be used to provide disaster recovery. In addition, snapshots provide point-in-time images of data, and clones generally provide a writeable image of data, which may be used for various data operations.
As a result of such imaging operations, the same data may be duplicated a number of times in a storage system. In many large-scale storage systems, storage controllers have the ability to “deduplicate” data, by which any duplicate copies of a given data block are deleted, and any references (e.g., pointers) to those duplicate blocks are modified to refer to the one remaining instance of that data block. A result of this process is that a given data block may end up being shared by two or more logical data containers in different images of the storage system. Alternately, in some instances, only the logical references to data blocks (i.e., the logical data containers) are duplicated when an image of a file system is generated. The underlying data blocks are not duplicated for generation of the images. Consequently, a given data block may end up being shared by two or more logical containers from different images. In one example, the given data block may be shared by a data container in an active file system and another data container in a snapshot images of the file system.
In the storage system, it is common for a set of special system files, namely metadata files, to be maintained in addition to the stored data. Metadata files are files that contain information about the stored data. In some instances, the storage system maintains separate metadata files for different types of stored data. For example, the storage system may maintain separate metadata files for the storage volume, for each logical container of data (e.g., files, virtual block references, etc.) in the storage volume, and for each physical data block referenced by the logical containers of data. Such metadata files are useful in various operations of the storage system. In one example, the storage system uses information from the metadata files for various image generation processes.
In some instances, metadata files are created and/or updated based on demand, i.e., when data is created, deleted, and/or modified in the storage system. In some instances, a partial or complete reconstruction of metadata files of a storage system may be warranted. Such instances include, for example, corruption of a file system of the storage system, upgrade of the storage system, recovery of the storage system from a disaster event, etc. To perform this reconstruction, the storage system triggers a metadata scanner that scans logical data containers in both the active version of the file system and in all images of the file system. The metadata scanner, while scanning each logical data container, triggers metadata generation for each physical data block referenced by the logical data container.
When metadata generation is triggered for a given data block, the metadata file (or simply, “metadata”) is created for that data block from scratch. A problem with such an approach is that metadata is created for the given data block from scratch even if metadata was previously generated for that data block (e.g., in a prior image referencing the given data block). Consider the scenario where a data block is shared by a first logical data container from a first snapshot of the file system and a second logical container from a second snapshot of the file system. In such a scenario, the metadata scanner would generate metadata for the data block from scratch a first time when the first logical data container is scanned during a scan of the first snapshot. The metadata scanner would generate metadata for the data block from scratch again, for a second time, when the second logical data container is scanned during a scan of the second snapshot. Consequently, the overall metadata creation/update/reconstruction may take a longer time due to unnecessary and redundant repetition of metadata generation for shared data blocks. Additionally, the redundant repetition of metadata generation results in inefficient and wasteful usage of the processing capacity of the storage system.