1. Field of the Invention
The present invention relates to the field of computer systems. In particular, the present invention relates to live insertion of CPU and input/output (I/O) boards into a computer system.
2. Background Information
Increasing number of powerful server computer systems are now being offered in the market place by a number of vendors. More and more of these server computer systems are being used and depended upon to support complex and/or critical business applications. As dependency increases, expectation of uninterrupted availability of these systems by their purchasers also follows. As a result, a number of live insertion technologies has emerged to allow these systems to be field repaired or upgraded without interrupting their availability.
To ensure compatibility or interchangeability of parts, many of these server computer systems adhere to various industry standards. For example, the processor(s), memory subsystems and I/O subsystems are typically built around an industry bus standard, such as Multibus II. Thus, at least in some aspects, if not in totality, the live insertion technologies are bus standard specific.
Taking Multibus II as an example again, among other things, each CPU or I/O board is required to perform and support certain testing protocol at power on or reset. FIG. 1 illustrates this prior art power on/reset testing protocol. As shown, at power on/reset, the CPU or I/O board is required to perform initialization self test (IST), step 14. At the end of IST, the CPU or I/O board is required to set an IST complete flag to denote test complete, step 16. Next, the CPU or I/O board determines if it passed the self test, step 18. If the CPU or I/O board failed the self test, it takes no further action. On the other hand if the CPU or I/O board passed the self test, the CPU or I/O board is required to determine if it is a potential system test master, step 20.
If the CPU or I/O board is a potential system test master, the CPU or I/O board is further required to set a potential system test master (PSTM) flag to denote such potential accordingly, step 22. Then, the CPU or I/O board is required, upon setting the PSTM flag, to iteratively check whether another CPU or I/O board at a lower slot also has indicated it is a potential system test master, steps 24-28. If the result of the checking is negative, the CPU or I/O board won the system test master arbitration, and is required to act as the system test master, step 30. Under the Multibus II standard, the system test master is required to clear the IST completion flag of each CPU or I/O board that it wants to participate in the system test.
However, if it is determined at step 20 that the CPU or I/O board is not a potential system test master, or at step 26 that there is a CPU or I/O board at a lower slot also having indicated it is a potential system test master, the CPU or I/O board is required to "sleep" for a predetermined amount of time, step 32, typically sufficiently long to ensure the test master arbitration process to complete. Then, the CPU or I/O board is required, upon sleeping for the predetermined amount of time, to check if its IST complete flag is still set, step 34. If the IST complete flag is still set, the CPU or I/O board is required to check its own program table and act accordingly in an application dependent manner, step 38. Otherwise, the CPU or I/O board is required to participate in the system test as a slave device, step 36.
Since each CPU or I/O board is required to support the power on/reset testing protocol, if one of these CPU or I/O boards is inserted live, it will attempt to go through the protocol when it is given power. Since other CPU or I/O boards in the system would have gone beyond the testing stage and functioning in a normal operating mode, the live inserted CPU or I/O board would win the system test master arbitration. The live inserted CPU or I/O board would then attempt to cause system testing to be performed, potentially causing unpredictable results to the system.
Various hardware and/or software approaches have been considered by the industry to resolve this problem. However, all the hardware and/or software approaches known to the inventors all have one common disadvantage in that it requires modification to and/or additional support by the CPU or I/O boards. Since there are literally hundreds if not thousands of CPU or I/O board products in the industry, any solution requiring modification to and/or additional support by the CPU or I/O boards is less than desirable.
As will be disclosed in more detail below, the present invention provides for a method and apparatus for managing the above described power on/reset protocol, without requiring any changes to or additional support by CPU or I/O boards.