A networked data storage system can be used for a variety of purposes, such as providing multiple users access to shared data, or facilitating backups or data mirroring. A networked data storage system may include a number of storage servers. A storage server provides services related to the accessing and organizing data on mass storage devices, such as disks. Some storage servers are commonly referred to as file servers, as these storage servers provide clients with block-level access to data. Some storage servers provide clients with sub-file level access to data (e.g., block-level access). An example of a storage server is any of the file server products made by Network Appliance, Inc. in Sunnyvale, Calif. A storage server may be implemented with a special-purpose computer or a general-purpose computer programmed in a particular way. Depending on the application, various networked storage systems may include different numbers of storage servers.
Logical units of storage may be created and manipulated on storage servers, such as files, directories, volumes, qtrees (which is a subset of a volume, optionally associated with a space usage quota), logical unit numbers (LUNs), etc. Such logical units are referred to as storage objects in this document. Creating a single storage object is typically fast and easy, but managing a storage object over time is more difficult. A storage administrator has to make numerous decisions, such as how to monitor the available space for the storage object, how to schedule data backups, how to configure backups, whether the data should be mirrored, where data should be mirrored, etc. Answers to the above questions may be summarized in a data management policy. When the data management policy is determined, the administrator works to ensure that the policy is correctly implemented on all relevant storage objects, that the required space is available, that the data protection operations succeed, and the like. If the administrator decides to change the policy (for example, extending the amount of time that backups should be retained), the administrator normally must find all of the affected storage objects and then manually re-configure all of the relevant settings.
As the number of storage objects grows in the system, the administrator's job becomes more difficult and complex. It becomes increasingly likely that the administrator may not readily determine what policy was supposed to apply to a given storage object, or why a given volume is mirrored (e.g., copying data from a given volume to a backup volume so that the data can be recovered). In addition, the administrator normally has to perform many tedious manual tasks for each storage object, which can be error prone and unreliable. A large data center may have hundreds to over a thousand storage servers. Each storage server may manage hundreds of storage objects (e.g., volumes and thousands of qtrees). This leads to a total of tens to hundreds of thousands of storage objects to manage with a similar number of backup and mirror relationships. The number of objects typically grows faster than the number of administrators that are employed, so each administrator manages more and more objects over time. Eventually, the sheer number of objects makes it increasingly less economical, if not impossible, for an administrator to reliably implement data management policies and to accurately check for conformance to the data management policies.
A data management policy is used to describe how stored data is to be protected against data loss. The policy describes an intended behavior for data storage using storage objects. In some embodiments, a data management policy may be represented by a tree graph having a number of nodes and branches. FIG. 6 shows a tree graph of one embodiment of a data management policy. The tree graph 610 includes nodes 611-626 and branches 651-655. Each node represents a storage object and is coupled to another node via a branch, which describes the relationship between the two corresponding storage objects. For example, branch 653 is marked as a “backup” connection between nodes 612 and 614. Thus, storage object represented by node 614 is a backup copy of the storage object 612. Backup copies of storage objects thus provide redundant storage. Another relationship that can be specified is a “snapshot” process in which the active tile system (e.g., a file system to which data can be both written and read) at the storage site is captured and the “snapshot” is transmitted as a whole, over a network to the remote storage site. A snapshot is a persistent point in time (PPT) image of the active file system that enables quick recovery of data after data has been corrupted, lost, or altered. Snapshots can be created by copying the data at each predetermined point in time to form a consistent image, or virtually, by using a pointer to form the image of the data. Accordingly, the graph 610 represents how the administrator intends to manage data in the data storage system.
Conformance is determined by comparing the configuration of storage objects actually used to store a data set against a set of user-defined policies and configurations. If the policies are not being adhered to, then the system is out of conformance. Software applications have been written to attempt to help ease the burden of management for ensuring conformance with data management policies.
In conventional policy-driven management applications (such as IBM Tivoli, HP OpenView, Calif. Unicenter, and BMC Patrol), an administrator is normally expected to understand the state of the systems under management and to know what changes should be made to the system to accomplish a goal. A storage management application from Replicus allows a user to specify levels of redundancy but does not allow a user to see the consequences of the specification before applying the results. Additionally, the conventional management applications do not scale well as the complexities of data storage systems grow in an exponential fashion. Furthermore, conventional approaches use an “if-then” approach for every state of a system and do not abstract away many of the technical details that administrators may not understand nor would care to deal with. Thus, calculating actions to perform for reconfiguring a system in response to a given policy change is of unbounded complexity as systems become more complex, and storage administrators desire help in identifying nonconformities, abstracting details, and determining the effects of bringing a system into conformance.