A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information. A directory, conversely, may be implemented as a specially formatted file in which information about other files and directories are stored.
A filer may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server. In this model, the client may comprise an application, such as a database application, executing on a computer that connects to the filer over a computer network. This computer network could be a point to point link, a shared local area network (LAN), a wide area network (WAN) or a virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the file system on the filer by issuing file system protocol messages (typically in the form of packets) to the filer over the network.
The disk storage typically implemented has one or more storage “volumes” comprised of a cluster of physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is generally associated with its own file system. The disks within a volume/file system are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability and integrity of data storage through the redundant writing of data stripes across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a known file system and process, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID 4 group.
Each filer “owns” the disks that comprise the volumes that the filer services. This ownership means that the filer is responsible for servicing the data contained on the disks. If the disks are connected to a switching network, for example, a Fibre Channel switch, all of the filers connected to the switch are typically able to see, and read from, all of the disks connected to the switching network. However, only the filer that owns the disks can write to the disks. In effect, there is a “hard” partition between disks that are owned by separate filers that prevents a non-owner from writing to a disk.
In one known file system, this ownership information is stored in two locations. Each disk has a predetermined sector arbitrarily named sector S that contains the definitive ownership information. In one example, sector S is sector zero of the disk. The second source of this ownership information is through the use of Small Computer System Interface (SCSI) level 3 reservations. These SCSI-3 reservations are described in SCSI Primary Commands-3, by Committee T10 of the National Committee for Information Technology Standards, which is incorporated fully herein by reference. One technique for implementing disk ownership is described in commonly owned U.S. patent application, Ser. No. 10/027,457, filed on Dec. 21, 2001 entitled SYSTEM AND METHOD OF IMPLEMENTING DISK OWNERSHIP IN NETWORKED STORAGE by Susan M. Coatney, et al., which is hereby incorporated by reference.
The combination of sector S and SCSI-3 reservation ownership information is often represented by the following format <SECTORS, SCSI>, where SECTORS denotes the ownership information stored in sector S and SCSI is the current holder of the SCSI-3 reservation on that disk. Thus, as an example, if sector S and the SCSI-3 reservation of a disk both show that the disk is owned by a filer, arbitrarily termed “Green,” that disks' ownership information could be denoted <G,G>, where “G” denotes green. If one of the ownership attributes shows that the disk is unowned, a U is (arbitrarily) used, i.e. <G,U>for a disk whose SCSI-3 reservations do not show any ownership.
It is advantageous for the services and data provided by storage system to be available for access to the greatest degree possible. Accordingly, some computer storage systems provide a plurality of file servers (or filers) in a cluster, with a property that when a first filer fails, the second filer is available to take over and provide the services and the data otherwise provided by the first filer. When a first filer fails, the second filer in the cluster should assume the task of processing and handling any data access requests normally processed by the first filer. Such cluster configurations are described in U.S. patent application Ser. No. 09/625,234 filed on Jul. 25, 2000 now issued as U.S. Pat. No. 6,728,897 on Apr. 27, 2004 entitled NEGOTIATING TAKEOVER IN HIGH AVAILABILITY CLUSTER by Samuel M. Cramer, et al.
In certain known file server cluster implementations, the transport medium is Ethernet cabling utilizing Transport Control Protocol/Internet Protocol (TCP/IP) for transport of data. Various file service protocols can execute on top of the TCP/IP protocol. In known failover techniques involving clusters of file servers, network interface controllers (NIC) contain the capabilities to support multiple machine address controller (MAC) addresses. When one of the file servers in a cluster detected a failure of its partner filer, for example, by sensing that the partner filer is no longer emitting a heart beat signal, this surviving filer proceeds to take over the partner's disks. This involves asserting SCSI reservations so that only the surviving filer can access those disks. This surviving filer then executes a failover script, which involves obtaining the IP address of the failed filer and determining each MAC address associated with the failed filer. Each NIC of the surviving filer is then assigned a MAC address that was normally associated with a NIC on the failed filer. Thus, transfers with IP addresses which were mapped to certain MAC addresses of the failed filer, are no longer routed to the failed filer, but instead are directed to the surviving partner filer.
In alternate embodiments of the known implementations, instead of reassigning MAC addresses to the surviving partner, a new mapping from the IP address to a MAC address associated with the surviving partner is transmitted or broadcast over the network using the Address Resolution Protocol (ARP). ARP is further described in Request For Comments (RFC) 826: Ethernet Resolution Protocol, published by the Internet Engineering Task Force (IETF), which is incorporated herein by reference.
A noted disadvantage of prior implementations of clustered failovers occurs if the underlying transport media did not support the moving of transport addresses. By “transport address” it is meant any network address associated with a particular filer. In such cases, the routing techniques normally utilized to achieve the failover would not function. For example, the Fibre Channel transport media does not support moving transport addresses. Fibre Channel is a set of specifications defining a transport media for high-speed efficient networks. The specifications for Fibre Channel are developed by Committee T11 of the International Committee for Information Technology Standards. Fibre Channel does not generally permit unsolicited packets to be broadcast, for example an ARP broadcast with updated routing information. Unsolicited packets may be sent over Fibre Channel, but the sending of unsolicited packets is often destructive in that the unsolicited packet breaks all open connections.
Additionally, virtual interface (VI) connections do not permit the use of unsolicited packets or the reassignment of transport addresses. Virtual interface is a standard for an architecture between high performance network hardware and computer systems. The VI architecture is defined in Virtual Interface Architecture Specification, Version 1.0, published by a collaboration between Compaq Computer Corp., Intel Corp., and Microsoft Corp., which is hereby incorporated by reference.
The inability to failover to a cluster partner, if the underlying media does not support moving the transport address, is especially relevant when utilizing certain file systems that rely on such transport mechanisms, including, e.g., the Direct Access File System (DAFS). When using these file systems that utilize transport mechanism which do not support moving transport address or similar routing techniques, known failover procedures will not function. DAFS is a file system protocol which is defined in DAFS: Direct Access File System Protocol, Version 1.0 published by the DAFS Collaborative, which is hereby incorporated by reference. DAFS traditionally runs over a non-TCP/IP transport protocol such as a virtual interface (VI) or the InfiniBand Trade Association's InfiniBand™ connection utilizing Fibre Channel as a transport media. Thus, known failover techniques typically would not function in a DAFS environment.
Thus, traditional clustered failover techniques will not function in networking environments that utilize transport protocols that do not support moving transport addresses among network nodes.