A networked data storage system can be used for a variety of purposes, such as providing multiple users access to shared data, or facilitating backups or data mirroring. A networked storage system may include a number of storage servers. A storage server may provide services related to accessing and organizing data on mass storage devices, such as disks. Some storage servers are commonly referred to as filers or file servers, as these storage servers provide file-level access to data. Some of these filers further provide clients with sub-file level access to data (e.g., block-level access). An example of such a storage server is any of the Filer products made by Network Appliance, Inc. in Sunnyvale, Calif. The storage server may be implemented with a special-purpose computer or a general-purpose computer programmed in a particular way. Depending on the application, various networked storage systems may include different numbers of storage servers.
Logical units of storage may be created and manipulated on storage servers, such as files, directories, volumes, logical unit numbers (LUNs). Such logical units are referred to as storage objects in this document. Creating a single storage object is typically fast and easy, but managing a storage object over time can be difficult. A storage administrator has to make numerous decisions, such as how to monitor the available space for the storage object, how to schedule data backups, how to configure backups, whether the data should be mirrored, where data should be mirrored, etc. Answers to the above questions may be provided in a data management policy, and once this policy is decided, the administrator needs to ensure that the policy is correctly implemented on all relevant storage objects, that the required space is available, that the data protection operations succeed, and so forth. If the administrator decides to change the policy (for example, extending the amount of time that backups should be retained), the administrator has to find all the affected storage objects and then manually reconfigure all the relevant settings.
As the number of storage objects grows in the system, the administrator's job becomes more difficult and complex. It becomes increasingly likely that the administrator may not readily determine what policy is supposed to apply to a given storage object, or why a given volume is mirrored. In addition, the administrator has to perform many tedious manual operations for each storage object, which can be error prone and unreliable. Thus, a storage administrator needs help tracking what storage objects exist in a storage system, how the storage objects relate to other objects, and which policy should be applied to the storage objects.
Other important challenges for storage administrators include deciding how to manage their storage infrastructure and ensuring that their storage systems are managed in the way they have decided. The first challenge is the problem of deciding on a data management policy. The storage administrators have a plethora of choices to make when deciding on a policy. They need to decide how often to back up data, how long to retain the back up copies, whether to use local snapshots to provide local backups, whether to mirror storage objects, and so forth. Typically, one way to manage the data depends on what type of data it is, and how important the data is. For example, the data for a mission-critical product order database requires a different data management policy from the home directories of ex-employees.
Once an administrator has defined a data management policy, the policy has to be described or encoded in such a way that other administrators can understand and carry out the policy. This description is often in the form of an operation manual written in a natural language commonly used in human communication, such that the description can be read and interpreted by other members of the storage management staff. Currently, there is no automated implementation of a policy written in this form. Furthermore, there is currently no way to automatically audit a data center and find storage objects that are not being managed in compliance with the policy. Conventionally, administrators write ad-hoc tools and/or use ad-hoc queries to monitor the storage environment and to look for discrepancies between the policy and the states of storage objects. If the administrators find a discrepancy, they have to manually decide what actions to take to correct the situation. This takes so much time and is so error prone that most large data centers have little confidence on whether they are managing their data in accordance with their data management policies.