In a computer system that includes Itanium Processor Family (IPF) chips, the processors are located in a plurality of cells and may be arranged in a plurality of partitions or protection domains. IPF chips are produced by Intel. Compared with monolithic systems, multi-cell or cellular computer systems are more difficult to reset. On monolithic computer architectures, all resources in the system can be reset simultaneously by asserting one pin or one wire. Cellular systems may be divided into different partitions, each of which need to be reset individually and at different times. Moreover, the cells may be migrated from one partition to another. This makes it is very difficult to reset the cells within a single partition.
FIG. 1 depicts a block diagram of a firmware model 100 for an IPF system. The firmware has three components that separate the operating system (OS) 101 from the processors 102 and the platform 103. The firmware, in general, isolates the OS 101 and other higher level software (not shown) from implementation differences in the processors 102 and the platform 103. The platform 103 includes all of the non-processor hardware. One firmware is the processor abstraction layer (PAL) 104. This layer includes processor implementation specific features and is part of the Itanium processor architecture. PAL 104 operates independently of the number of processors. Another firmware is the platform/system abstraction layer (SAL) 105. SAL 105 includes the platform-specific features. The last firmware is the extensible firmware interface (EFI) 106. This layer is the platform binding specification layer that provides a legacy-free application programming interface (API) to the operating system. PAL 104, SAL 105, and EFI 106 together provide system initialization and boot, machine check abort (MCA) handling, platform management interrupt (PMI) handling, and other processor and system functions which would vary between implementations. Additional information on IPF systems may be found in Intel manuals “Intel Itanium Architecture Software Developer's Manual” and “Itanium Processor Family System Abstraction Layer Specification”, both of which are incorporated herein by reference.
A common specification used by the OS is the advanced configuration and power interface (ACPI) 107. This specification defines an industry standard interface that enables the OS to direct motherboard configuration and system power management, which is referred to as operating system directed configuration and power management (OSPM). Additional information on ACPI may be found in the ACPI specification “Advanced Configuration and Power Interface Specification”, which is incorporated herein by reference.
FIG. 2 depicts an example of a system 200 showing an arrangement of the processors 102 and platform 103 of FIG. 1. In FIG. 2, the system 200 has five cell boards 201, with each cell board comprising a plurality of processors 202. The system has two partitions or protection domains, namely partition A 203 and partition B 204. Resources within a partition may be used by any of the processors within the partition. Access to resources in other partitions is restricted, and thus this arrangement prevents errors in one partition from migrating to another partition.
Since cells in one partition can be reassigned to another partition, it is difficult to coherently reset all of the cells in a partition. Previous attempts to perform coherent reset of multiple cells typically introduced spurious errors into the partition. One solution is to reset each cell as the cell is located. However, this solution has a disadvantage in that resources disappeared that were still needed or being used by other cells in the partition. Thus, spurious errors are often introduced because there is too much time between the first cell of the partition being reset and the last cell of the partition being reset. Another solution is to reset each cell without attempting to idle the processors. However, this solution also introduces spurious errors into the partition because CPU are attempting transactions that depended on other resources. Another solution is to execute the reset code from main memory. However, this solution has the disadvantage that the main memory can become incoherent as cells reset, and allowing fetches to fail, thus compromising the ability to complete the reset.