The widespread use of modern computer systems has made it a necessity to assure continual and immediate access to enormous amounts of information or data. The inability to provide access to the data, even for a short amount of time, can have catastrophic consequences. Consequently, modern computer systems utilize mass data storage configurations which provide redundancy to assure a high level of availability of and access to the data while simultaneously guarding against the loss of the data. The redundancy may take many different forms.
One form of redundancy generally involves making multiple copies of the data, which is sometimes referred to as mirroring. The copy of the data is available for use quickly if the primary copy of that data is corrupted or becomes inaccessible.
Redundancy may also be achieved by the use of mathematical techniques which enable the entire data to be defined mathematically without completely copying the data. Mathematical algorithms permit the data to be reconstructed if the copy of the complete data becomes unavailable. One of the principal types of data storage configurations which are widely used to assure duplicate copies of the data is any one of the well-known types of Random Array of Independent or Inexpensive Disks (RAID) mass storage configurations.
The redundancy to assure access to the data is typically achieved by using multiple copies of the components necessary to communicate read and write data management operations between data storage units and one or more data management computer devices. A data management computer device manages and controls the data communication operations to and from the data storage units with typical read and write operations or commands, as well as performing other data management and integrity functions invoked by executing data storage operating system software. An example of a data management computer device is a traditional file server, although a data management computer device is also capable of managing data communications with respect to blocks of data as well as files of data, as might occur in a storage attached network or a fiber attached network. Each such data management computer device is referred to herein as a “filer.” One technique of assuring multiple redundant communication pathways to mass storage units is a clustered failover configuration of filers and data storage units, is described in greater detail in the above-identified US patents and applications.
In a clustered failover configuration, two or more filers are associated with one another in a principal and partner or backup configuration. Each of the filers has at least one and typically a multiplicity of data storage units connected to it in a manner which permits the filer to manage its normal read and write data operations with those principally associated data storage units. The data storage units are connected to each filer in a serial configuration or in a connection which it establishes serial-like communication, typically by using serial connectivity links with and between serial interface adapters. Such a serial connectivity is desirable to implement a high volume data transfer protocol such as the well-known fibre channel protocol. In general, a serial connection permits a greater amount of data to be managed by the filer, as compared to a parallel or bus-type connection of the data storage units to the filer. Each filer is typically connected as a node of a data communication network, which allows data processors (referred to herein as “clients”) that form other nodes on the data communication network, to access each of the filers for the purpose of reading and writing data to and from the data storage units managed by each filer.
Should an unanticipated failure of a principal filer occur, the partner or backup filer assumes responsibility for managing the data storage units which are normally managed by the principal filer. Management by the partner filer is achieved through an alternative connection from the partner filer to the serially connected data storage units which are normally managed by the failed principal filer, thereby allowing the partner filer to commence managing the read and write operations to those data storage units normally managed by the principal filer. In addition, the partner filer continues managing the read and write operations to those data storage units principally associated with the partner filer itself.
The event of the partner filer assuming responsibility for managing the data storage units normally associated with the failed principal filer is called a “failover,” indicating that the partner filer has taken over the serving operations of the failed principal filer. While a failover results in some reduction in performance, caused by the partner filer having to manage the read and write operations associated with the data storage units of two filers, redundancy is achieved because the data remains available and accessible due to the failover functionality performed by the partner filer. After the problem that caused the failover has been corrected, it is necessary to perform certain manual and software procedures to restore the now-functional principal filer to its normal operating status and to conform the data transactions handled during failover by the partner filer into a form which can be assumed and recognized by the restored principal filer. Thus, even though a failover in a clustered mass storage configuration preserves data availability and accessibility, it is still desirable to avoid a failover condition altogether, if possible, because of the performance-diminishing effects on the partner filer and the added effort required to restore the mass storage system to its normal operating status.
Even though the cluster failover configuration of multiple filers secures the advantages of redundancy in data availability and accessibility in the series-connected data storage units, complete communication pathway accessibility or connectivity to all of the data storage units has not been possible. The principal and partner filers are connected by one connection to the data storage units, and the availability of communications to the other serial-connected data storage units depends on maintaining the integrity of the cables which connect the storage units in the serial configuration. A broken or disconnected cable between two of the individual serially-connected data storage units, or a failure of a serial connection interface to one of the data storage units, or even a disconnected or failed disk drive device within one of the individual data storage units, can have the consequence of disabling one or more of the data storage units which are serially connected to either the principal or the partner filer.
In those circumstances where completely redundant connectivity to each of the data storage units is required or desired, a fiber switch has been used to connect all of the data storage units in a selectable matrix-like configuration between both the principal and partner filers. The matrix-like switching capability of a fiber switch allows connectivity to be established with any of the data storage units. The fiber switch assures a direct connectivity path from the principal and partner filers to each one of the individual data storage units, should there be a failure in the normal, high-volume, serial-connectivity configuration between each of the data storage units in the cluster.
While the matrix-like connectivity available from a fiber switch assures reliable connectivity between each filer and each individual data storage unit, fiber switches are relatively expensive. In fact, the expense of fiber switches is so significant that some users may be deterred from obtaining the benefits of redundancy in connectivity. Moreover, because the data communication performance by use of a fiber switch is less than the data communication performance achievable by use of the serial connectivity using a fibre channel protocol, the fiber switch can not be used as a substitute for the higher performance serial connectivity in high performance mass storage systems. Thus, both the serial channel connectivity and the fiber switch matrix connectivity must be employed for maximum redundancy, and the use of both connectivity configurations increases the cost of mass storage systems.