Tiered storage solutions generally include multiple levels of storage systems, each one providing a different level of data storage service. Some storage systems are very expensive, providing fast and feature rich service options for data storage, while other, less expensive storage systems provide fewer features at reduced performance. The components that are included in a particular customer's storage solution should correlate the cost spent on storage to the perceived value of stored customer information. Thus customers such as financial institutions with larger amounts of ‘critical’ data may include a larger number of expensive systems than customers with less critical data.
However, selecting the appropriate storage systems to use in the storage solution does little to ensure that the storage is used appropriately. During operation, as a customer accesses data file objects, the objects are transferred between the different tiers of the storage solution. As time passes objects are displaced from their allocated devices, resulting in inappropriate use of storage. To remedy this problem, storage solutions often include an Information Manager (IM). The Information Manager (IM) is a host device which stores at least a subset of file system meta-data. The meta-data includes attribute information for each object in the file system. The IM analyzes the file-system meta-data to identify objects that should be moved to a different storage tier. The IM moves objects to different tiers to maintain the alignment of object value to storage device service level.
Because object migration is based on the meta-data stored by the IM, it is critical that the IM data base stores a complete and current version of the object attributes that are used when determining an objects' value. The accuracy of this database is crucial for effective object management, however, the population and maintenance of the database is time consuming and heavy in resource utilization. This is because to populate the meta-data data base, or to retrieve appropriate meta-data for processing on-demand, the Information Manager must scan all files on primary storage using a series of Network Attached Storage (NAS) protocol operations. The process of scanning all of the files on the primary storage is referred to as a “NAS crawl.”
During the NAS crawl, each object is located, and all attribute information associated with the object is collected. The retrieval of all attribute information necessitates multiple NAS operations because different NAS protocols (such as Network File System (NFS) and Common Internet File System (CIFS)) associate different attributes with each object. For example, at a minimum at least three NAS operations are required to collect the attribute data: a directory lookup, NFS attribute retrieval and CIFS attribute retrieval. Additional primary server access operations may be required to retrieve optional extended attributes. Each operation generates network and CPU processing load associated with Transmission Control Protocol (TCP) and NAS protocol stack processing on both the Information manager host and on the server. Even if multi-threading techniques are applied to reduce the latency associated with attribute retrieval, the overhead associated with populating the IM data base becomes prohibitively time and compute intensive as the file system grows large.
Once the data base is populated it may be used to identify files that should be migrated to different storage tiers. However, as the objects are used over time, the IM data base may become out of synch with the actual file system. To ensure the accuracy of file migration operations, the Information Manager must periodically synchronize its meta-data database with the current contents of the primary storage. There is generally a limited time window afforded to the data base update operation in order to minimize its' impact on the performance of the storage system.
Several different methods may be used to synchronize the meta-data of the IM data base with the file system. For example, a NAS crawl may be performed to identify changed files. However, as described above, a NAS crawl of the primary storage file system will become prohibitively time and compute intensive as the file system grows large. Alternatively, event notifications may be issued by the NAS server to inform the Information Manager whenever a change in the file system meta-data occurs. The event notification approach suffers from the performance overhead incurred on the NAS server to generate and send the events. In addition, in periods of heavy change, the IM may not be able to adequately handle the event stream, causing events to be ‘missed’, and the data base accuracy to be compromised.
Another method for synchronizing the data base is to generate attribute update logs. The logs may be periodically scanned to identify files having updated attributes. However, such an approach degrades the performance of the NAS server, which uses valuable compute cycles generating log information, and may also incur significant overhead storage costs to maintain the logs.
The performance issues caused by the maintenance of the IM data base by the Information Managers may tend to outweigh the benefits provided by their services. IMs may seek to decrease the data base population time by retrieving only basic attributes, but such a data base optimization reduces the complexity of values that may be attributed to objects, thereby concomitantly reducing the effectiveness of the file migration process. It would be desirable to identify a method which would permit complex analysis of file objects for file migration purposes, without adversely affecting storage system performance or over taxing storage resources.