A Storage Area Network (SAN) is a sub-network of shared storage devices. SAN's architecture works in a way that makes all storage devices available to all servers on a Local Area Network (LAN) or Wide Area Network (WAN). As more storage devices are added to a SAN, they become be accessible from any server in the larger network. SANs are used by applications (e.g., business applications) to store large volumes of data.
Enterprise applications aim at providing reliable service with transparent fail-over that is scalable as well. High Availability has become a critical requirement for enterprises and failures should be almost transparent to end users. In addition, mechanisms should be put in place to allow system maintenance without disrupting the current workload.
Virtual servers may be used to make more efficient use of physical servers. A virtual frame or server management system may manage virtual servers and, when a virtual frame management system is deployed, it becomes a critical component in a data center. As such, if the virtual frame management system fails, any systems that are using the virtual servers could probably also fail. To maintain availability of the virtual frame management system, factors such as runtime state, persistent state, data plane, control plane, and minimization of failover time need to be properly addressed.
The runtime state of a virtual frame management system is information that is not stored persistently, typically due to the nature of the data and/or the rate at which it changes, coupled with the cost of using persistent storage. The persistent state relates to information which is stored on media and is capable of being recovered after a power cycle. The data plane and control plane refer to a relationship between the managed system and the manager. Minimization of failover time includes downtime associated with the managing system handling and recovering from a failure of the virtual frame management system.
Typically, one approach to maintaining high availability of a virtual server system uses a replicated state. This approach involves a multi-tiered architecture, where each tier is responsible for its own availability and for being able to detect and route around failures in any of the other tiers. In a real-time replicating system, any changes in state to one manager are propagated to the other manager (this assumes two managers exist for redundancy). This can exist in an active-active or active-standby scenario. With this approach, the system needs two data stores, each with copies of the data of both managers.
However, real-time replication involves a great deal of complexity as transactions have to be distributed across multiple systems. For example, there are associated complexities involved in the recovery workflows to bring a failed manager back in synchronization with the rest of the system. Also, there is a cost in terms of performance to do this replication.
Another approach uses clustered file systems. This approach utilizes a shared storage system, where the applications or file systems on the managers are capable of simultaneously accessing the same storage device. In this case, there is one copy of the data that is accessible from multiple machines.
Clustered file systems involve additional cost. For example, to support one specific manager, only one file system can be used, which might not agree with the file systems of other network components. Also, there is a compatibility issue of obtaining database software that is compatible with the chosen file system.