Growth in data-intensive applications such as e-business and multimedia systems has increased the demand for shared and highly available data. A Storage Area Network (SAN) is a switched network developed to deal with such demands and to provide scalable growth and system performance. A SAN typically comprises servers and storage devices connected via peripheral channels such as Fibre Channel (FC) and Small Computer Systems Interface (SCSI), providing fast and reliable access to data amongst the connected devices. FIG. 1 shows a simple example of a SAN (10) comprising two servers (Server A (20) and Server B (30)) connected by a FC-AL (40) to a series of disks (50) configured as a redundant array of independent disks (RAID). The SAN (10) is in turn connected through Server A (20) and Server B (30) to a series of client workstations (60) via a network (70) (e.g. Ethernet/Internet). Server A (20) and Server B (30) are themselves in further communication through a private connection (80) which is not accessible by the client workstations (60) and whose purpose is to facilitate server resetting.
Referring now to FIG. 2 where the components of Server B 30 relevant to the present specification are shown in more detail. The server includes a PCI Bus 230 via which the main components of the server intercommunicate. A CPU 180 communicates with the PCI Bus 230 via a North Bridge controller 200 which also provides access for the CPU to system memory 190 and the PCI Bus. A fibre channel interface chip 220, decodes incoming fibre channel information and communicates this across the PCI bus, for example, by using direct memory access (DMA) to write information into system memory 190 via the North Bridge 200. Similarly, information is written to the chip 220 for encoding and transmission across the fibre channel 40. A network adaptor 160 allows the CPU to process requests received from clients 60 across the network 70, perhaps requiring the CPU 180 in turn to make fibre channel requests for data stored on the disks 50. In the present example, the server includes a dedicated reset controller and watchdog circuit 300, for example, Dallas Semiconductor DS705. On the one hand, the reset controller 300 monitors the state of the CPU and if it decides the CPU has hung, it will automatically reset the entire server by asserting a system-reset signal, which is in turn connected to most of the major components of the server. Alternatively, the CPU 180 or, for example, a signal that is asserted by another server on the private connection 80 could be used to actively reset the server by instructing the reset controller to assert the system-reset signal.
Whilst a SAN with large amounts of cache and redundant power supplies ensures that data stored in the network is protected at all times, user-access to the data can be disabled if a server fails. In a SAN context, server clustering is a process whereby servers are grouped together to share data from the storage devices, and wherein each server is available to client workstations. Since various servers have access to a common pool of data, the workstations have a choice of servers through which to access that data. This has the advantage of increasing the fault tolerance of the SAN by providing alternative routes to stored data should a server fail, thereby maintaining uninterrupted data and application availability.
Clusters may be classified as being failover or load-balancing. In a failover cluster a given server may be a hot-spare (or hot-standby) which behaves as a purely passive node in the cluster and only activates when another server fails. Servers in load-balancing clusters may be active at all times in the cluster. Such clusters can produce significant performance gains through the distribution of computational tasks between the servers.
Any highly available or failover cluster with multiple servers requires a method of forcing a malfunctioning server off the system, to prevent it disrupting normal SAN operation. This facility is conventionally provided by a feature known as STOMITH (Shoot the Other Machine in the Head).
Faulty server operation can be detected through heartbeat monitoring by hardware or software watchdog type systems on individual servers. In this process, the FC-AL (or otherwise) connected servers each issue signals (or heartbeats) onto the FC-AL at regular intervals. The connected servers each have at least one watchdog whose purpose it is to detect the heartbeats of the other servers. When the heartbeat of a given server is detected by the watchdogs of the other connected servers, it indicates to such servers that the issuing server is functioning correctly. If however, the watchdogs fail to detect the heartbeat of a given server after a prescribed period (the watchdog timeout), the servers check that the FC-AL connections are functioning correctly. Further failed attempts to communicate indicate to the other connected servers that the issuing server is hung. In such circumstances, the private interconnection (80) between the servers enables one of the connected servers to reset or power down the hung server.
It is acknowledged that in the case of a high level watchdog operating over the FC-AL, no additional cabling is required. However, for low level watchdogs with STOMITH capability, private interconnections with dedicated cabling are required, making it difficult to easily expand the SAN beyond a dedicated backplane. Such dedicated wiring requires extra PWB traces and extra cabling between processors, which is both expensive and contributes to system unreliability by providing another potential failure point. Further, since the private interconnections are generally not FC connections themselves, they do not allow servers so interconnected to be separated by the same distances as would be achievable with FC connections (in FC it is possible to have devices separated by up to 30 km) thereby eliminating one of the advantages of using an FC-AL to connect the SAN.
Where the private connection 80 of FIG. 2 is not available, an alternative approach to the problem of resetting hung servers which avoids the necessity of private interconnections described earlier, is to use the FC-AL connections themselves to deliver reset instructions between servers.
In the case of FIG. 2, the servers on the FC-AL (40) are known to co-operate in a “buddy system” wherein at system initialisation each server is twinned with another so that each server has only one buddy and is itself a buddy to that server. Each buddy uses heartbeat monitoring on the FC-AL (40) to assess the status of its buddy.
However, whilst heart-beat monitoring on the FC-AL (40) of the connected buddies enables a server to detect if its buddy has hung, the normal FC protocol and FC-AL topology do not enable a server to reset a hung buddy. For instance in FIG. 2, without the connection 80, there is no way in which Server A (20) can access the reset controller and watchdog (300) of Server B (30) to reset Server B (30) if needed. Consequently, if Server A (20) detects that Server B (30) is malfunctioning, it can only send a message to Server B (30) alerting it of its hung state and advising Server B (30) to take the appropriate remedial action. However, if Server B (30) is so badly hung, that it cannot alleviate its own situation, then Server B (30) will remain hung, because Server A (20) cannot reset it.