This invention relates to techniques for achieving resilience in a multi-computer system.
Such systems are often used to support a large number of users, and to store very large databases. For example, a typical system may consist of 8 server computers, supporting up to 50,000 users and may store one or more 300 GigaByte databases.
It would be desirable to be able to provide such a system based on standard server software such as for example Microsoft Exchange running under Microsoft Windows NT. However, a problem with this is that of providing resilience to failure of one of the computers. The use of cluster technology for a system of this scale would be too expensive. Also, Microsoft Exchange is not a cluster-aware application, and it is not permissible to have two instances of Exchange on the same server (even a 2-node cluster).
According to the invention, there is provided a method of operating a computer system comprising a plurality of computers, a plurality of system disk units, one for each of said computers, and a plurality of further disk units, one for each of said computers, the method comprising:
(a) designating a plurality of said computers as active computers and designating another of said computers as a standby computer;
(b) using the further disk units to provide a synchronised recovery copy of data held on the system disk units, and
(c) reconfiguring the system in the event of failure of one of the active computers, by causing the standby computer to pick up the further disk unit corresponding to the failed computer.