Our society depends heavily upon computer systems in many of our everyday activities. Computer systems, which employ processors, control devices in our homes, in our business offices, in our manufacturing facilities, in our automobiles, and even in outer space aboard space shuttles and geosynchronous satellites. One can find computers and processors in such devices as desktop and laptop computers, mainframe computing systems, and in portable devices, such as mobile telephones and palm-held computers.
In addition to these existing applications, people are continually finding new applications for computer systems. Many of the applications are demanding improved processor performance and taxing modem computer systems. Examples of improved processor performance that computer designers are continually trying to improve include increased processor speed and faster data throughput. Examples of applications demanding improved processor performance are vision and speech recognition, climate or weather modeling, fluid turbulence modeling, human genome mapping, oil reservoir modeling, and ocean circulation modeling. All of these applications require mind-boggling quantities of computational muscle due to the large number of mathematical computations.
To meet the demands of these applications, computer system designers of have changed the architectures of processors, mostly microprocessors, tremendously. For example, computer systems of the 1980's and early 1990's generally had single central processing units that handled data in a linear or sequential fashion. Unfortunately, such sequential architectures only provide finite amounts of computational power, due to physical limitations of the microprocessors. Accordingly, computers today commonly employ multiple processors that crunch numbers simultaneously in various processor architectures, such as in parallel architectures.
As stated, many computing systems today contain multiple processors. Along with increasing the number of processors in computers, designers creating these multi-processor systems also tend to employ various techniques and design methods to tweak additional computing performance from these computer systems. Such techniques and design methods include pipelining, vector processing, and using superscalar architectures. Computer systems employing these techniques and design methods have generally followed Moore's Law, which states that the number of transistors and resistors on a chip doubles every eighteen months. Today it is not uncommon to find advanced computer system chips that contain millions, even billions, of transistors.
Unfortunately, computer systems and computer chips that employ increasing numbers of transistors and other integrated circuit elements tend to fail more often than systems and devices with fewer elements. To combat these increasing failure rates, computer manufacturers employ various design techniques that tend to improve the uptime and reliability of these systems. For example, one technique to improve uptime currently used in computer systems with multiple processors involves disabling a processor that has an internal failure. Upon detecting that a processor has an internal failure, the processor is held in a reset state. Holding the failed processor in the reset state effectively tri-states the outputs of the failed processor, allowing other processors attached to the common buses to continue operating.
However, there is a significant problem in attempting to allow computer systems with multiple processors to operate using this technique. A processor can fail due to a problem within the processor itself, or the processor may fail due to a problem with a voltage regulator providing power to the processor. When the problem is internal to the processor, the technique of holding the processor in the reset state, as discussed above, may disable the processor and allow the multiple processor system to continue operating. However, if the processor fails due to a problem with a voltage regulator supplying power to the processor, simply attempting to hold the processor in the reset state may not allow the system to continue operating without the processor.
The technique of holding the processor in the reset state may not work when the problem is associated with a voltage regulator supplying power to the processor, because the processor requires power in order to correctly tri-state the processor inputs and outputs. The failed processor inputs and outputs need to be tri-stated due to the fact that other processors are connected to the same data, control, and address buses. Without proper tri-stating, the inputs and outputs of the failed processor will hold signal lines in the buses in bad states, preventing the other processors from functioning properly.
The architectures of many multi-processor systems require the core voltages for each processor be independent of core voltages for other processors during normal operation. Consequently, if the voltage supply or voltage regulator fails for a problem processor, the processor cannot be properly tri-stated due to the lack of voltage. Most often, the core voltage plane is isolated from other voltage planes and no other source of core voltage is available. The end result is that a failure of a single voltage regulator in computer with multiple processors will prevent the computer from operating. There is, therefore, a need for methods and apparatuses that allow a multiple processor computer to operate when one of the voltage regulators fails.