The present invention relates to initialization processes for computers and, in particular, to an initialization process for a redundant system that boots deterministically.
Information systems are evolving to become the delivery mechanism that drives corporate revenues. In industries ranging from financial services to on-line shopping, the computer has become the business. Accordingly, protection of computer-based data is becoming of paramount importance to a corporation""s financial well being.
Fault-tolerant systems offer superior reliability characteristics through the use of redundant components and data paths that insure uninterrupted delivery of service. Even so, such systems may still fail due to hardware or software errors. In such a situation, it is often difficult to troubleshoot a fault-tolerant system due to the multiplicity of hardware units provided. For example, since a redundant, fault-tolerant system may include multiple CPUs, a single misbehaving central processing unit may sometimes boot properly, masking a system error and causing the error to be irreproducible. In these cases, the system cannot be examined to determine the cause of the failure.
The present invention provides a method and apparatus for booting a computer system with redundant hardware and/or software components in a deterministic fashion. Individual hardware and/or software components are selected and a boot process is performed using those selected components. Booting in this manner allows application programs written for traditional machine to be used without modification. Further, modifications to boot software are rendered minimal or non-existent using this scheme. Moreover, booting individual processor-I/O controller pairs allows system faults to be isolated and detected in a deterministic fashion.
In one aspect, the present invention relates to a method for deterministically booting a fault-tolerant computer having a plurality of processors and one or more input-output controllers. A first processor/input-output controller pair is chosen and an attempt is made to boot the chosen pair. In the event that the attempt to boot the chosen pair fails, a new boot pair is selected.
In another aspect, the present invention relates to a method for deterministically booting a fault-tolerant computer having a plurality of processor boards and one or more input-output controller boards. A first processor/input-output controller board pair is chosen and an attempt is made to boot the chosen board pair. In the event that the attempt to boot the chosen board pair fails, a new boot pair is selected.
In still another aspect, the present invention relates to an apparatus for deterministically booting a fault-tolerant system. The apparatus includes a plurality of processors, at least one input-output controller in communication with the processors, a memory element storing a list of processor/controller pairs, and a control module in communication with each element. The control module retrieves a first processor/controller pair identifier from the memory element and attempts to boot the processor/controller pair identified. In the event that the boot attempt fails, a second identifier is retrieved from the memory element and an attempt is made to boot the second boot pair identified.
In yet another aspect, the present invention relates to an apparatus for deterministically booting a fault-tolerant system composed of individual hardware or software objects. A set of hardware and/or software components is selected and a boot process is performed using this set of components. In the event that the boot fails, a new boot set is selected.