Network planning and planning for system recovery using computer systems have become an increasingly important item to corporations. Customers who use computing services have high expectations in wanting reduced costs for the services and little or no downtime when using those services. Furthermore, corporations and governments are entrusted to provide critical services in the event of certain disasters like terrorist activity, electrical outages, or natural disasters including hurricanes, tornadoes, and floods. As a result, a high expectation is placed on these corporations and governments to have an effective strategy in place to prevent or reduce computer system failures, called a failover (backup) system.
Failover systems sometimes include a clone of the original computer systems to provide a concurrent level of service as the original computer systems. Since cost may be a significant factor in the implementation of failover systems, one must understand the revenue impact that may be incurred from implementing both the original computer system and failover system. A very elaborate failover system might work well but may also be cost prohibitive.
In the current state of the arts, many computer systems implement a georedundant configuration for a failover system. The term georedundant means duplicating and locating for purposes here. When used in this document, georedundant configuration means a computer system that has been duplicated and located, and also connected to the original computer system. For example, if two computer systems are deployed in a network, the georedundant configuration would require two additional computer systems (identical to the original two computer systems) to be deployed in the network. Both the original computer systems and the duplicate computer systems would operate in the network and be connected together.
Georedundant configurations may be implemented in two ways for individual computer systems: “hot stand-by” or “mirrored” mode for computer systems. In the “hot stand-by” mode, the original computer system and the duplicate computer system operate together in an active/stand-by state. Computer traffic runs on only one computer system at a time under normal conditions in this state. Furthermore, the computer systems can switch their active/stand-by state with each other at time intervals. In the “mirrored” mode, the original computer system and the duplicate computer system share the workload equally. In this mode, both computer systems process half the amount of traffic as would normally pass through them.
Whereas one computer system handles the full workload between the two connected system and the other computer system waits in stand-by in the “hot stand-by” mode, in the “mirrored” mode, both computer systems handle half the workload while being connected together.
Fortunately, when a failure occurs in one computer system in the georedundant configuration, the other computer system can take over the computer traffic regardless of the types of modes implemented. This presents some unique problems from a processing standpoint. In the georedundant configuration, the computer system may only use fifty percent of its processing capability theoretically. In actuality, this number is closer to 40%. The reason for these figures are due to the configuration of the computer systems. One computer system must be implemented so that it can handle not only its processing traffic but also the processing traffic from a failed computer system. In the “hot stand-by” mode, if one computer system fails, the other one takes over all the processing of both systems. In the “mirrored” mode, the same thing occurs. A failure in one computer system will result in the other computer system handling all of the traffic.
In our discussion here, the georedundant configuration uses two computer systems, which averages out to about 50% processing utilization for each computer system. That means for every critical computer system needed in a network, a second computer system would have to be purchased, but only 50% processing utilization could be allocated for use since the failover strategy for the georedundant configuration would have to implemented. If a computer system failed then theoretically the duplicate computer system could takeover operations of the failed computer system's 50% processing utilization resulting in a 100% fully-used processing utilization at the duplicate computer system. The 50% number is a theoretical estimate. The processing utilization is closer to 40% for each computer system since no computer system could process computer traffic over a sustainable period at 100% if a failure in one of the systems occurred. Some processing utilization must be reserved for the administration of the computer system itself. Therefore, in a georedundant configuration with two computer systems, each computer system averages a use of approximately 40% processing utilization, an expensive setup when the number of computer systems are increased for more business needs. This means that a computer system may have an allotted processing capability of 80% dedicated to computer traffic. The remaining 20% would be reserved for administration activities. So, in the “hot stand-by” mode, one computer system processes at 80% capacity while the other waits idly ready to takeover the 80% capacity in the event of a failure. In the “mirrored” mode, both computer systems operate at 40% capacity with 40% extra processing capacity waiting idly on both systems to takeover from a failure in the opposite computer system.
For some time, it has been assumed that computer systems could not exceed 50% processing capacity for computer processor unit (CPU), memory, and disk because (as shown above) the computer system would be unable to sustain the additional load. On may see the problem of implementing a georedundant configuration for a failover system. The computer systems that must be implemented are not permitted to operate beyond a certain capacity. Business leaders may not take kind to the idea that a duplicate computer system has to be purchased and that that the total system can only operate at a certain capacity level for each computer system.