With the exponential growth in Internet communication powered by ever increasingly high-bandwidth applications, the need for digital information management has concomitantly increased dramatically. Network storage systems, such as SANs (Storage Area Networks) are designed to meet the demands of information processing and the requirements of performance, availability, and scalability in such complex storage systems.
Among network storage systems, SANs are deployed in enterprise environments at an increasing pace in order to gain performance advantages for business benefits. SANs are dedicated networks of interconnected devices (for example, disks and tapes) and servers to share a common communication in a shared storage infrastructure. The large scale and growth rate of SANs driven by enterprise demands for internet communication and high-bandwidth applications lead to a rapid increase in the complexity of management of such network storage systems. Any change to such large-scaled SANs is usually a high-risk action that could potentially cause unintended consequences. Often, system administrators of SANs have to carefully analyze the impact of a desired change before actually applying it to the SANs. This task is usually referred to as an impact analysis, change analysis, or what-if analysis.
Due to the complexity of the SAN, the impact analysis is very important as one resource attribute can significantly impact even seemingly unrelated resources. For example, increasing the transaction rate of a workload can violate the QoS (Quality of Service) requirements of a seldom run workload due to the contention at a common switch. Additionally, SANs are initially designed using various best practice policies such as single host types in one zone, redundant paths between hosts and storage, etc., but progressive changes to the SAN such as adding hosts or workloads further complicate the process of adhering to those best practices.
Manually analyzing the impact of a particular change does not scale well, as the size of the SAN infrastructure increases with respect to the number of devices, best practices policies, and the number of applications. Thus, when deploying new applications, hosts and storage controllers can be down in the order of days or weeks because system administrators have to reactively try to correct the problems associated with the deployment.
Typically, change management tools have been reactive in their scope in that they keep snapshots of the previous state of the system, and the system administrators either revert to or compare the current state with a previous state after encountering a problem. Additionally, system administrators do not have a way of assessing the impact of their proposed changes with respect to a future state of the system. For example, a system administrator could potentially allocate increased bandwidth to an application by taking only the current workload into account. However, this could conflict with other scheduled jobs or known trends in workload surges that will increase the workload on the system in the future. Thus, it is important for system administrators to assess the impact of their actions not just with respect to the current state of the systems but also with respect to future events.
With the recent autonomic computing initiative, policy based management of storage resources is increasingly being adopted by industry. The SNIA (Storage Networking Industry Association) standardization body is developing a standard for describing policies associated with networked-enabled storage systems. The policy definition uses 4-tuple rules with an “if” condition that specifies what needs to be evaluated, a “then” clause indicating the action that needs to be taken when the policy is triggered, a broad scope that identifies the resources that would impact the policy, and a priority that is used to break ties when multiple policies are triggered. Policy-enabled SANs are inherently more complex to analyze, since an operation can potentially impact hundreds of policies, each of which will have to be evaluated in connection to other policies. In addition, a policy violation can automatically trigger an action that can also contribute to the overall impact on the SAN. For example, a policy “if the transaction-rate of an application goes below a threshold value, then start a backup job” may be triggered and therefore results in an action of starting a backup job that impacts the SAN similar to introducing a new workload, like causing switch contentions, increased bandwidth utilizations and increased controller loads.
Several conventional approaches in the field of policy-based network storage systems have been proposed. One such conventional approach uses a predictive impact analysis for change management functionality. However, the impact analysis is performed only for a small set of policies mainly related to security LUN (Logical Unit Number) Masking. Furthermore, along with the narrow scope of policies, this conventional approach exclusively supports notification as the policy action, and does not permit self-correcting and automatic actions that further impact the SAN. These limitations present an important shortcoming of this conventional approach, since typically system administrators would specify policy actions in order to correct erroneous events and would be most interested in analyzing the impact of the triggered actions that could cause a significant performance overhead.
Another conventional approach addresses a wider range of policies. However, its policy evaluation techniques use a coarse classification of scopes. In such a scheme, each policy is designated as a scope to denote the class of entities such as hosts, HBAs (Host Bus Adapters), etc. The motivation for such scope-based classification is to allow system administrators to check for a select class of entities and policies in the SAN. This form of classification is not very efficient for impact-analysis due to the following reasons: (1) lack of granularity whereby some policies have to be classified into many higher-level scopes which causes inefficient evaluation, e.g., a policy that requires a Vendor-A host to be connected only to Vendor-S storage has to be classified into “Hosts”, “Storage”, and “Network” scopes since some changes to elements of the three scopes can cause the policy evaluation; but this classification causes their evaluation for any event in the three scopes, (2) failure to identify relevant SAN regions that can result in duplicate regions in the path traversal for a policy evaluation in order to provide a correct general solution, and (3) failure to exploit the locality of data across various policies such as in a scenario of having two distinct policies for an action evaluated without using an efficient method of caching the results from one for use to evaluate the other.
Yet other conventional approaches exclusively address performance policies called SLO (Service Level Objectives). While these conventional approaches focus on a very limited subset of policies, they fail to consider the impact of user actions on these policies or the impact of their triggered actions on the SAN.
A further disadvantage of the foregoing conventional approaches lies in the fact the impact analysis is done in a reactive mode with respect to the current state of the systems without proactively assessing the impact on the future state of the systems.
In view of the inadequacy of the conventional methods for analyzing impact of policy changes on policy-based storage area network, there is still an unsatisfied need for an impact analysis system that can perform in a wide range of policies to proactively assess the impact of the actions of these policies on a variety of system parameters prior to making those changes.