Multiprocessor computers by definition contain multiple processors that can execute multiple parts of a computer program or multiple programs simultaneously. In general, this parallel computing executes computer programs faster than conventional single processor computers, such as personal computers (PCs), that execute the parts of a program sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a program can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
Multiprocessor computers may be classified by how they share information among the processors. Shared-memory multiprocessor computers offer a common memory address space that all processors can access. Processes within a program communicate through shared variables in memory that allow them to read or write to the same memory location in the computer. Message passing multiprocessor computers, on the other hand, have a separate memory space for each processor. Processes communicate through messages to each other.
Shared-memory multiprocessor computers may also be classified by how the memory is physically organized. In distributed shared-memory computers, the memory is divided into modules physically placed near a group of processors. Although all of the memory modules are globally accessible, a processor can access memory placed nearby faster than memory placed remotely. Because the memory access time differs based on memory location, distributed shared-memory systems are often called non-uniform memory access (NUMA) machines. By contrast, in centralized shared-memory computers, the memory is physically in just one location. Such centralized shared-memory computers are called uniform memory access (UMA) machines because the memory is equidistant in time and space from each of the processors. Both forms of memory organization typically use high-speed cache memory in conjunction with main memory to reduce execution time.
Multiprocessor computers with distributed shared memory are often organized into nodes with one or more processors per node. The nodes interface with each other through a network by using a protocol, such as the protocol described in the Scalable Coherent Interface (SCI)(IEEE 1596). Companies, like Intel Corporation, have developed "chip sets" which may be located on each node to provide memory and I/O buses for the multiprocessor computers.
Such chip sets often have predetermined memory addresses for basic input/output systems (BIOS), interrupts, etc. The BIOS comprises the system programs for the basic input and output operations and represents the lowest level software interface to the system hardware. Typical BIOS functions include accesses to hard disk drives, timers, and graphics adapters. An example of a chip set having predetermined memory addresses is one that follows an Industry Standard Architecture (ISA) having memory addresses dedicated to particular functions, such as system BIOS, video BIOS, graphics adapters, expansion memory, etc. A chip set may also include an interrupt controller that has a fixed range of addresses. An example of an interrupt controller is the Advanced Programmable Interrupt Controller (APIC) developed by Intel Corporation.
When a multiprocessor computer system is first powered on or otherwise reset, the processors in the system are initialized by setting them to a known state. The reset causes a processor to jump to the system BIOS to begin code execution. The BIOS brings the system through an initialization procedure (also called booting) whereby diagnostic routines are run on the system hardware, such as memory and the processors. After the initialization procedure is complete, an operating system is loaded onto the computer system. The operating system includes a program that performs a number of tasks central to the computer's operation including managing memory, files and peripheral devices, launching application programs, and allocating system resources.
There are several problems associated with initializing a shared-memory, multinode computer system. For example, it is desirable to use standard BIOS routines, rather than developing BIOS particular to the multinode environment. However, the standard BIOS routines are designed for a single-node environment and initialize hardware components on a node to predetermined addresses in conformance with the single-node environment. Consequently, when each node separately executes its BIOS, it sets hardware components to the same predetermined addresses as other nodes are setting their hardware components to. In such a situation, the nodes are said to have "overlapping" memory addresses because a memory location on one node has the same physical address as a memory location on another node. However, in a shared-memory, multiprocessor system each memory location needs to have a unique address so that the system can differentiate one memory location from another.
Another problem associated with the initialization of a multinode, shared-memory computer system is limiting a potential point of failure. For example, in a single or multinode environment, typically one processor in the system is given control of the booting process. This processor (often called the "centralized boot processor") brings each of the other processors in the system through a separate initialization procedure. Having a centralized boot processor is a favored architecture because it is easier to implement than coordinating multiple boot processors running separate initialization routines on each node. However, the centralized boot processor represents a potential single point of failure. That is, if the centralized boot processor fails for any reason, the entire system fails, even though other processors in the system are properly operating. additionally, having a centralized boot processor substantially slows the system initialization, since each processor is separately initialized in series.
An objective of the invention, therefore, is to provide in a shared-memory, multinode computer system, an initialization procedure that substantially eliminates a potential, single point of failure. A further objective of the invention is to provide such a shared-memory, multinode computer system that utilizes chip sets available for single-node computer systems. Still a further objective is to provide such a system that uses the well-established, PC-based BIOS for initialization.