In providing reliable computer services, and especially for Internet applications such as HTTP, FTP, News and email, a fundamental requirement is a platform for providing those services which is both scaleable and reliable.
There are two kinds of scaleability: vertical and horizontal. Vertical scaleability is best characterized by the IBM paradigm of the 1970's and 1980's, in which a company's growing need for computer services meant that a less powerful computer was replaced in its entirety by a more powerful computer. This paradigm has been substantially discredited for a number of reasons, including the fact that it is limited to whatever is the most powerful hardware available and because of the expense of such a single machine becomes prohibitive. Such machines are inherently not optimal in price/performance. The lack of reliability of a single machine is also a key limitation in vertical scaleability.
Horizontal scaleability, in contrast, adds more computer systems as load increases. Each of these computers is typically less powerful than the vertically IS scaleable solution, but the combined power of multiple such systems frequently exceeds that of the vertical solution for many applications. The horizontal solution also permits the user to maximize the cost benefit of prior investments (i.e., prior purchases of still-compatible computer systems). Horizontal scaleability can therefore be seen to offer a number of-key advantages over vertical scaleability.
Reliability is also best served by a horizontally scaleable solution. Reliability is measured by the availability of computer services when needed; since no single computer has ever proved to have 100% up-time, reliability requires more than one computer capable of providing the needed computer services.
As the need for continuously available computer services has grown, the need for increased scaleability and reliability has also grown. One of the key issues has been to ensure that a service provided by a first computer, normally termed a host, can be provided by another computer, or a backup, in the event the host becomes unavailable. This transfer of services is termed failover, and in current systems is typically handled by software.
Two failover schemes are well-known in the prior art. One-to-one failover designates a host system as primary and a backup system as secondary; in the most classic implementation of this approach, the secondary system is idle--that is,-it provides no services--until the host fails. When the host becomes unavailable, the secondary system provides the services normally provided by the host. Symmetric one-to-one failover is a similar technique, wherein each of the "host" and "backup" systems provide distinct but useful sets of services when both are available, and each is capable of providing the services normally provided by the other. Thus, each system is both a primary and a secondary, but only the one machine can serve as a backup to the other.
The second failover scheme known in the prior art is many-to-one failover. In this approach there are many primary systems but only a single secondary, with each of the primaries providing a distinct set of services. The secondary or backup system is capable of performing any of the services provided by any or all of the primaries, but normally sits idle until a primary fails.
Each of these schemes is limited in that the networks are reliable only as long as only one system fails; network services become unavailable if more that one system becomes unavailable. In addition, these systems do not allow for good failover scaleability because the secondary system typically must be identified at initial configuration and cannot thereafter be changed. Along this same line, prior art systems do not allow failed hosts to be permanently deinstalled, nor do they allow new hosts to be added and configured without reconfiguring existing hosts. An additional limitation of such prior art techniques is the inability to perform load balancing.
There has therefore been a need for an improved failover system in which computing services continue to be available over the network even when more than one host or primary system has failed, and in which hosts may be added or removed without reconfiguring the remainder of the systems forming the network.