1. Field of the Invention
The present invention relates to a CPU suppression system and a CPU suppression method using a service processor, which detects a CPU showing signs of unstable operation before operating system (OS) starts and prevents the CPU from being incorporated into a system.
2. Description of the Related Art
FIG. 1 is a diagram showing an overview of a configuration of a CPU suppression system using a service processor for realizing conventional CPU suppression. A Micro Program 11 of FIG. 1, when being triggered by system resetting such as power activation, rebooting, or resetting of the system, functions as a module for performing diagnostics of units constituting the system. The Micro Program 11 shown in FIG. 1 is executed by a CPU of an operational system end, that is CPU (A) 10 in FIG. 1. It should be noted that although CPU (A) 10 on the operational system end is described as a single CPU in FIG. 1, it will be obvious that the CPU on the operational system end actually comprises a plurality of CPUs.
Because the point in time at which the CPU control of the operational system end is given to the Micro Program 11 shown in FIG. 1 is after a operation check is performed for the CPU by the service processor end shown by arrow (1), the CPU on the operational system end is in a state that can at minimum perform basic operations using its own hardware resources. For a CPU on the operational system end that is determined to be NG (No Good) via a CPU operation check on the operational system end performed by a service processor CPU(B) 20, the service processor CPU (B) 20 executes CPU stop processing so that control is not given to the Micro Program 11.
The Micro Program 11 shown in FIG. 1, as shown by arrows (2), (3), and (4) in FIG. 1, sets initial settings and performs diagnostics of units constituting the system such as the operational system CPU(A) 10, memory 12, and an I/O unit 13, and exchanges information with the service processor end.
FIG. 2 is a diagram showing an overview of a processing sequence that explains the configuration of the CPU suppression system using a conventional service processor along with the Micro Program, the system CPU, and the service processor. The numbers in the sequence chart shown in FIG. 2 correspond to the numbers in the configuration diagram of the CPU suppression system using the conventional service processor shown in FIG. 1, and therefore, in the following description, the configuration of the CPU suppression system using the conventional service processor is explained with reference to FIG. 1 and FIG. 2. As shown in FIG. 2, first, the system CPU is powered on at the service processor end (A1). Next, a operation check of the system CPU(A) 10 ((1) of FIG. 1) is performed (A2). At that time, if the CPU (A) 10 on the operational system end is in such a state that it can perform basic operation using its own hardware resources, CPU control is given to the Micro Program 11 (A4). When the operation check of the CPU on the operational system end determines a diagnosis of NG (No Good), the service processor executes CPU stop processing and, as a result, control is not given to the Micro Program 11 (A3).
The Micro Program 11 that is given the control from the service processor CPU (B) 20 begins setting the initial settings and performing diagnostics of the units constituting the system, and performs CPU diagnostic processing ((2) of FIG. 1) (A5).
During the setting of initial settings and the performance of diagnostics of the unit constituting the system, the service processor CPU (B) 20 constantly monitors error occurrences in the system CPU (A) 10 and informs the CPU on the system end of an error occurrence only immediately before the Micro Program shifts to the next control process (A6). From the time at which the Micro Program 11 performs initial settings/diagnostics of the units constituting the system (arrows (2), (3) and (4) of FIG. 1) to the time after the OS starts, the service processor CPU(B) 20 recognizes (arrow (5) of FIG. 1) the occurrence of a CPU hardware error, and when the number of occurrences of errors exceeds a preset threshold, the CPU is considered to be “a CPU showing signs of unstable operation that may consequently cause critical errors” and is placed in a waiting status by the OS. In other words, in the conventional art, the CPU in question is brought to a state in which processes are not assigned and the CPU is separated in a manner similar to software rather than being physically separated, and the Micro Program 11 executes suppression triggered by the next resetting. When the result of the diagnostics by the Micro Program 11 is a diagnosis of OK, the normal CPUs continue program processing and the diagnostic processing is terminated (A8). On the other hand, if the CPU diagnostic processing by the Micro Program results in an indication of NG of the CPU on the system end itself, the Micro Program 11 itself suppresses the CPU (A7), and even when CPU suppression occurs, if the remaining normal CPUs can continue the program processing, the processing is continued and diagnostic processing by the Micro Program 11 is terminated (A8). In other words, even if the OS does not assign processes to the CPU showing signs of unstable operation, it is the state after the OS has been started and the time at which such a CPU is separated is the time of rebooting etc. that is the next Micro Program starts (see Patent Document 1).
As described above, because the conventional CPU suppression system using a service processor recognizes a CPU showing signs of unstable operation after the OS starts and executes CPU suppression upon the next system reset, there has been a problem such that suppression of a CPU showing signs of unstable operation cannot be executed before the OS starts.
In the manufacturing of computers, variations in CPU quality due to the production process are inevitable. Therefore, identification of CPUs showing signs of unstable operation caused by the quality variation, recognition of the level of the unstable operation, and stable operation of the system by separating the CPU from the constituting units before system operation are crucial issues.
Recognition and suppression of CPUs showing signs of unstable operation before the OS starts and preventing the unstable CPUs from being incorporated into the units constituting the system is of great importance to improving the robustness of the operational system and to reducing the maintenance work time for correcting failures (“failure maintenance time” hereinafter) and the cost of maintenance when a failure occurs after operation has begun. However, when such processes are realized by the hardware functions, system cost increases led by the increase in hardware that needs to be implemented and the increase in the system size are impediments to resolving the issues.
Patent document 1: Laid-Open Japanese Patent Application Publication No. H08-087341