1. Technical Field
The present invention relates to low-level hardware access for initialization and run-time monitoring for processor and support chips in a data processing system. Particularly, the present invention provides a method for providing low-level hardware access to in-band and out-of-band firmware.
2. Description of Related Art
Traditionally, during the power-on phase of computer systems, central processing units (CPUs) start to execute instructions and initialize the systems into a state from which the operating system can be loaded. In addition to executing user applications, the operating system also runs applications that are needed to keep the system functioning. These applications, also referred to as system-control tasks, are responsible for monitoring system integrity and process any errors that might occur during operation. Usually, there is only one operating-system image controlling all aspects of system management. This type of system control is typically referred to as in-band control or in-band system management.
An exponential growth of computing requirements has resulted in the creation of larger, more complex, systems. Power-on and initialization of these large systems up to the point at which the operating system is fully available can no longer rely only on the system CPU. Instead, systems incorporate “helpers” (e.g., embedded controllers) that facilitate the initialization of the system at power-on. However, during power-on of these complex systems, critical errors can occur, which would prevent loading the host operating system. In the initial case in which no operating system is available, a mechanism is required for reporting errors and performing system management functions. Furthermore, given the diversity of user applications, it is no longer true that one operating-system image controls the entire system. At the high end, today's computer systems are required to run multiple different operating systems on the same hardware. A single instance of an operating system is no longer in full control of the underlying hardware. As a result, a system-control task running on an operating system which is not under exclusive control of the underlying hardware can no longer adequately perform its duties.
As a solution, system-control operations of a large system are moved away from the operating systems and are now integrated into the computing platform at places where full control over the system remains possible. System control is therefore increasingly delegated to a set of other “little helpers” in the system outside the scope of the operating systems. This method of host OS-independent system management is often referred to as out-of-band control, or out-of-band system management. In addition, logical partitioned systems may also run a “hypervisor,” which manages multiple logical partitions. This hypervisor is a firmware layer which runs on the CPU (host firmware) and is considered in-band.
Typical servers have associated control structures some of which are composed of “cages.” A cage may be a central electronic complex (CEC) cage or an I/O cage. A CEC cage contains a set of CPUs forming an SMP system together with its cache structure, memory and cache control, and the memory subsystem. In addition, the CEC cage may contain an I/O hub infrastructure. A system may contain one or more such cages. A cage may also be an I/O cage, which may facilitate I/O fan-out by linking the I/O cage to a CEC cage on one side and by providing bus bridges for the I/O adapters on another side.
Each CEC or I/O cage may contain an embedded controller which is called a cage controller (CC) or support processor, which interfaces with all of the logic in the corresponding cage and any external components. Sometimes two support processors are used to avoid any single point of failure. The support processors typically operate in master/slave configuration. At any given time, one controller performs the master role while the other controller operates in standby mode, ready to take over the master's responsibilities if the master fails. As a master, the support processor may perform functions, such as:                At power-on, determine configuration by reading the vital product data (VPD); VPD being a model number, part number, serial number, etc.;        Initialize the functional hardware to a predetermined state by scanning start-up patterns into the chained-up latches using JTAG (Joint Test Association Group, IEEE 1149.1 boundary scan standard) or other shift interfaces.        Initiate and control self-tests of the logic circuitry.        At run-time, monitor and control operating environmental conditions such as voltage levels, temperature, and fan speed, and report any error conditions to system-management entities. In case of critical conditions, directly initiate preventive measures (e.g., emergency power-off) in order to prevent safety hazards.        
In order to perform these functions, the embedded controller typically uses one of the following interfaces for intra-cage control:                I2C bus.        GPIO (general-purpose I/O, sometimes referred to as digital I/O).        UART (universal asynchronous receiver/transmitter, usually referred to as serial port).        JTAG (Joint Test Association Group, IEEE 1149.1 boundary scan standard).        
As typical cages may contain many field-replaceable units (FRUs), the cage controller is used to initialize the FRU upon replacement. Each FRU is controlled by multiple interfaces. These interfaces are designed to support features such as upgrading of the configuration of a cage, or “hot-plugging” of FRUs in concurrent repairs. However, in low-end systems, it is sometimes prohibitive to provide the necessary connectivity from the support processor to all the chips in the system. Thus, it is desirable to limit the connectivity to a small subset of the chips, and provide an indirect mechanism to access the remaining chips from this limited subset.