A SAN or storage area network, sometimes called a storage network environment, is a network dedicated to enabling multiple applications on multiple hosts to access, i.e., read and write, data stored in consolidated shared storage infrastructures. A SAN consists of SAN devices, for example, different types of switches, which are interlinked, and is based on a number of possible transfer protocols such as Fiber Channel and iSCSI. Each server is connected to a SAN with one or more network cards, for example, an HBA. Application data is stored as data objects on storage devices in storage units e.g. LUNs. The storage device may be used to store data related to the applications on the host.
In storage network environments, it may be important to detect changes in the network infrastructure or changes in network components and determine the implications of these changes on the storage service levels provided to applications and hosts. Enterprises are increasingly deploying large-scale SANs to gain economies-of-scale business benefits, and are performing and planning massive business-critical migration processes to these new environments. These enterprise SANs may contain hundreds or thousands of servers and tens or hundreds of switches and storage devices of different types. Furthermore, these storage network environments undergo frequent change and growth as, for example, hosts are added to the storage network.
This large size and rate of growth of SANs leads to huge added complexity. The number of components and links which may be associated with the data transfer from each given application and one or more of its data units may increase exponentially with the size of the SAN. This complexity, which is compounded by the heterogeneity of the different SAN devices, leads to high risk and inefficiency. Changes to the SAN (which need to happen often due to the natural growth of the SAN) take a long time to complete by groups of SAN managers or administrators, and are error-prone. For example, in many existing enterprises a routine change (such as adding a new server to a SAN) may take 1-2 weeks to complete, and a high percentage of these change process (sometime as high as 30-40%) include at least one error along the way. It is estimated that around 80% of enterprise SAN outage events are a result of some infrastructure change-related event.
The complexity of storage network environments has recently been further complicated by the increasing adoption of virtual servers or virtual machines (VMs) as hosts within storage network environments. In these environments, one or more of the physical servers or hosts may be further partitioned into one or more virtual servers. Such a virtualization of the physical servers, or virtualization of the storage network environment, allows for efficiency and performance gains to be realized. These gains may be realized in terms of service-level metrics or performance metrics, e.g., storage capacity utilization, server utilization, CPU utilization, data traffic flow, load balancing, etc. It is well known that the higher the number of VMs compressed onto a physical server, the greater the savings. A major benefit of VMs is their ability to stop, shift and restart on different physical servers or hosts. For each physical server or host that is retired in place of a virtual server, there is a corresponding reduction in power, space and cooling requirements. The number of network interface cards, network cables, switch ports, HBAs, fiber channel cables and fiber channel ports are all reduced. These cost reductions are significant, and when compounded with the performance and/or efficiency gains, allow for a much more well-managed storage network environment. In general, the goal of SAN administrators is to maximize resource utilization while meeting application performance goals. Maximizing resource utilization means placing as many VMs per physical server as possible to increase CPU, network, memory, SAN and storage array utilization.
Recently, several market and technology trends have converged to create conditions suitable for virtual server adoption. First, server hardware performance continues to increase faster than the ability of most applications to use it. As a result, many organizations are barely getting above 20 percent server central processing unit (CPU) utilization, a large inefficiency in processor utilization which can be addressed by using virtual servers, or virtual machine, technology, within the physical servers or hosts.
Second, the market adoption of Microsoft™ Windows™ servers running on x86 CPUs has dramatically driven down the cost of computing. However, most system administrators will not run multiple applications on a single system image because they fear that conflicts e.g. DLL conflicts and other incompatibilities will cause systems to crash. The result has been one physical server with one operating system deployed for each application, leading to a proliferation of underutilized servers in the data center. This inefficiency can be addressed with virtual server technology i.e. using virtual servers on one physical server.
Third, many companies are undergoing data center consolidation efforts to control the massive sprawl of underutilized server capacity that is consuming space and power in today's increasingly expensive data centers. Each operating system image and server also requires costly, labor-intensive operating system (OS) patch maintenance and updates to the corresponding physical infrastructure. As part of these consolidation efforts, data center teams must choose between re-hosting these applications on even more powerful virtual servers (exacerbating the underutilization problem) or leaving them on old, unsupported physical server hardware. This consolidation effort can thus be aided with virtual server technology.
In the recent past, companies have been adopting virtualization applications such as VMware™, Microsoft™ Virtual Server, and XEN™. These applications reduce underutilization by enabling data center teams to logically divide the physical servers e.g. x86 servers or hosts into a single, dual, quad or even eight-way and above independent, securely operating virtual server or virtual machine (VM) systems. As explained above, consolidating five, ten, twenty, or even forty server images onto one physical server has tremendous benefit.
Given the rapid rate of adoption of VMs (nearly 80 percent of VM production implementations today are connected to central storage within storage network environments), it seems that virtual server or virtual machine technology is here to stay. Moreover, the adoption of virtual machine technology is projected to grow over the coming years, making it even more important that systems for managing virtualized storage network environments work effectively.
In particular, virtualization of the physical servers or hosts in the storage network environment allows for the possibility of running multiple operating systems and applications on the same physical server at the same time e.g. a single VMware ESX server may by “virtualized” into 1, 2, 4, 8, or more virtual servers, each running their own operating systems, and each able to support one or more applications. This virtualization of the servers may be enabled using software such as VMWare e.g. VMware ESX, which allows the virtualization of hardware resources on a computer—including the processor, memory, hard disk and network controller—to create a virtual server with an independent operating system.
However, with the benefit of virtual machine technology in storage network environments come storage network environment problems that need to be addressed. There is a challenge in terms of determining how to effectively manage the large number of server images or virtual servers on each physical server in a storage network environment. For each VM that is created, a portion of allocated storage is used up. Unless the SAN administrator systemically goes back to delete these storage volumes, the storage space is consumed. This type of VM sprawl has the potential to increase storage consumption by an order of magnitude, thereby reducing the benefit of having virtual servers deployed in storage network environments.
In addition, it is even more difficult to detect network state changes, which occur frequently, within a storage network environment with virtual servers than in a storage network environment with only physical servers. For instance, a failure of a storage area network switch may eliminate an access path between two components on the network thereby disrupting the corresponding data flow to many virtual servers instead of just one physical server.
Because of the potentially large number of components in the storage network environment with virtual server technology, including the potentially large number of virtual servers that may be present in this environment, the very frequent storage network environment changes, the large amount of local state information of each component, and because of the complexity of performing the correlation of the information and analysis of the end-to-end access paths and attributes, any network environment state change detection approach needs to be very efficient to perform the task of detecting changes effectively.
Currently, there are no adequate technological solutions to assist SAN or VM administrators in managing changing storage network environments with virtual machine components e.g. virtual servers. In particular, SAN administrators cannot quickly and dynamically discover all relevant changes in SAN state, particularly in relation to application data requirements involving virtual servers. For instance, a server outage is always a serious event in a storage network environment. In a virtualized storage network environment, however, the impact is an order of magnitude higher simply because for each virtual server outage, many more applications are affected.
Until recently, no software or hardware applications were available to manage virtualized storage network environments. Current storage management solutions rely on host agents in hosts that contain virtual servers (hosts that have been “virtualized”) within the SAN to collect a partial set of information from these virtual servers. Using this partial set of information, SAN administrators then rely on manual methods e.g. manual spreadsheet based information entry, trial and error, etc., to manage change events in the virtualized storage network environment. Furthermore, host agents on a physical server are very difficult to manage and/or maintain, and are widely considered undesirable for large SANs in which scalability may be important.
Therefore, there is a need for a solution to the problem of efficiently discovering state change events and analyzing and monitoring service levels and performance metrics in virtualized storage network environments, and for the problem of mapping these changes to access paths and storage service levels or performance level requirements for applications and/or hosts.