1. Field of the Invention
The invention relates to multiprocessor computer systems, and more particularly, to start-up logic for assigning logical central processing unit (CPU) designations among multiple CPUs, with the booting CPU being reassigned based on the existence of certain prior errors on that CPU.
2. Description of the Related Art
Advancements in computer technology proceed at a tremendous rate. Modern microprocessors operate at frequencies so high that processors only a few years old seem sluggish and lethargic in comparison. On the other hand, modern applications have become so complex and versatile that even the high performance delivered by modern systems is taxed to the limit. Consequently, the effort to develop yet more powerful and effective computer systems continues.
One well known method of improving computer performance is to provide multiple processors in a single system. Asymmetrical multiprocessor systems, in which one microprocessor is the master and another microprocessor performs specific functions as a slave of the master microprocessor, are common and well known. Although the master/slave relationship improves computer performance due to the division of tasks, the computer does not operate at maximum capability. This is because the slave processor performs only particularly designated operations, and thus remains idle when a task not designated for the slave processor is performed. While these operations are executed, the computer system is no more efficient than a single processor system.
The computer system's efficiency may be further enhanced by making the multiple processors symmetrical. In a symmetrical system, any processor can perform any required function. Thus, all microprocessors operate simultaneously, spending little or no idle time, and the computer system operates near its maximum efficiency. In addition, the system may be further improved by adding supplementary microprocessors as the workload increases. Adding microprocessors is particularly effective in file server systems having an array of independent functions to be performed simultaneously.
Although symmetrical multiprocessor systems are efficient, they are difficult to design. One of the many obstacles to overcome in designing a symmetrical multiprocessor system is the potential presence of a non-functional processor. A simple method of booting up a multiprocessor system is to power up one of the central processing units (CPUs), generally designated CPU0, and ignore the others. When the first CPU has booted up, the first CPU then turns on and tests the remaining CPUs and the various components of the system. If the first microprocessor does not function properly, however, it cannot turn on the remaining processors and the entire system is left incapacitated. Consequently, the computer owner or operator has a computer system with one or more operational CPUs, but ironically, the system is useless until the repairman arrives.
In addition, for many DOS based applications and for booting purposes, one of the CPUs must be designated as CPU0. CPU0 commonly performs various functions for the system, like DRAM refresh operations, which make CPU0 unique even in a symmetrical multiprocessor system. Thus, most multiprocessor systems require one of the microprocessors to be designated as CPU0. In many systems, a CPU residing in a particular physical location is always designated as CPU0. If one of the CPU locations is always designated as containing CPU0, however, the slot may be empty, or the CPU may fail, crippling the entire system.
One system, described in more detail below, addressed the problem of the CPU in the first physical slot failing by automatically rotating the CPU0 designation to the CPU in the next physical slot if the first CPU did not perform a selected operation within a given time period. This solved the problem of the total failure of the CPU, as it was marked bad and the CPU0 designation was rotated. However, it was not helpful in the cases of the more marginal failures, such as parity errors, which are relatively infrequent but are critical enough that the CPU cannot be considered dependable. The prior system would have simply continued until the CPU failed again, with a resulting loss of time and productivity and potential loss of information. Thus, it is desirable to have CPU0 rotation on causes other than total CPU failure.