1. Technical Field
The present application is related to co-pending application entitled “METHOD FOR INDIRECT ACCESS TO A SUPPORT TNTERFACE FOR MEMORY-MAPPED RESOURCES TO REDUCE SYSTEM CONNECTIVITY FROM OUT-OF-BAND SUPPORT PROCESSOR”, Ser. No. 11/055,404, and application entitled “METHOD FOR PROVIDING LOW-LEVEL HARDWARE ACCESS TO IN-BAND AND OUT-OF-BAND FIRMWARE”, Ser. No. 11/055,675, all filed on even date herewith. The present invention generally relates to computer systems, and more specifically to an improved method of allowing firmware to access system status and configuration registers in a multiprocessor system.
2. Description of Related Art
The basic structure of a conventional symmetric multi-processor computer system 10 is shown in FIG. 1. Computer system 10 has one or more processing units arranged in one or more processor groups; in the depicted system, there are four processing units 12a, 12b, 12c and 12d in processor group 14. The processing units communicate with other components of system 10 via a system or fabric bus 16. Fabric bus 16 is connected to one or more service processors 18a, 18b, a system memory device 20, and various peripheral devices 22. A processor bridge 24 can optionally be used to interconnect additional processor groups. System 10 may also include firmware (not shown) which stores the system's basic input/output logic, and seeks out and loads an operating system from one of the peripherals whenever the computer system is first turned on (booted).
System memory device 20 (random access memory or RAM) stores program instructions and operand data used by the processing units, in a volatile (temporary) state. Peripherals 22 may be connected to fabric bus 16 via, e.g., a peripheral component interconnect (PCI) local bus using a PCI host bridge. A PCI bridge provides a low latency path through which processing units 12a, 12b, 12c and 12d may access PCI devices mapped anywhere within bus memory or I/O address spaces. PCI host bridge 22 also provides a high bandwidth path to allow the PCI devices to access RAM 20. Such PCI devices may include a network adapter, a small computer system interface (SCSI) adapter providing interconnection to a permanent storage device (i.e., a hard disk), and an expansion bus bridge such as an industry standard architecture (ISA) expansion bus for connection to input/output (I/O) devices including a keyboard, a graphics adapter connected to a display device, and a graphical pointing device (mouse) for use with the display device.
In a symmetric multi-processor (SMP) computer, all of the processing units 12a, 12b, 12c and 12d are generally identical, that is, they all use a common set or subset of instructions and protocols to operate, and generally have the same architecture. As shown with processing unit 12a, each processing unit may include one or more processor cores 26a, 26b which carry out program instructions in order to operate the computer. An exemplary processor core includes the PowerPC™ processor marketed by International Business Machines Corp. which comprises a single integrated circuit superscalar microprocessor having various execution units, registers, buffers, memories, and other functional units, which are all formed by integrated circuitry. The processor cores may operate according to reduced instruction set computing (RISC) techniques, and may employ both pipelining and out-of-order execution of instructions to further improve the performance of the superscalar architecture.
Each processor core 12a, 12b includes an on-board (L1) cache (actually, separate instruction cache and data caches) implemented using high speed memory devices. Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from system memory 20. A processing unit can include another cache, such as a second level (L2) cache 28 which, along with a memory controller 30, supports both of the L1 caches that are respectively part of cores 26a and 26b. Additional cache levels may be provided, such as an L3 cache 32 which is accessible via fabric bus 16. Each cache level, from highest (L1) to lowest (L3) can successively store more information, but at a longer access penalty. For example, the on-board L1 caches in the processor cores might have a storage capacity of 128 kilobytes of memory, L2 cache 28 might have a storage capacity of 512 kilobytes, and L3 cache 32 might have a storage capacity of 2 megabytes. To facilitate repair/replacement of defective processing unit components, each processing unit 12a, 12b, 12c, 12d may be constructed in the form of a replaceable circuit board, pluggable module, or similar field replaceable unit (FRU), which can be easily swapped installed in or swapped out of system 10 in a modular fashion. A command unit is a generic term that includes, among others, processor cores, and the service processors (which may also be called flexible service processor).
A prior art architecture shown in published US patent application US 2004/0215929 “Cross-chip communication mechanism in distributed node topology”, discloses a predecessor external scan communications command XSCOM. Therein was shown an alternative method to provide supervisory commands to core processors through the use of a ring connected set of registers within a chip and extending between chips. The disclosed method had the advantage that commands circulated by way of the old XSCOM didn't have to interrupt fabric bus transported commands.
However, the old XSCOM method, because it used a relatively low bandwidth medium, required significant overhead to place locks on the so called ‘mode ring’. Such locks could be placed by the service processor, or by individual threads operating on the cores of a chip.
In addition, the old XSCOM method had the drawback that once a XSCOM operation was initiated, the thread that owned that operation had to poll for status in order to receive the result. As can be seen, the old XSCOM required several instruction cycles to complete, which further required that the processor core had to gate off interrupts to avoid being interrupted during these cycles.
The architecture described above typically used a logical partition debugger and maintenance program known as the ‘hypervisor’, that allows, among other things, multiple computing environments on the same platform. The Hypervisor in a typical configuration would operate on one processor core. In order to coordinate functions on other chips, i.e. inter-chip commands, the Hypervisor needed to activate functions on a third command unit. This activity had the attendant drawback that in order to do this, many machine language instructions were needed, including a software lock on the resources of the target command unit. This meant that collisions of multiple processor cores needed to be avoided during the many machine instructions, as well as preventing the one or more service processors from trying to access the same facility at the same time.