1. Field of the Invention
The present invention relates to a technology to deal with a failure occurred at hardware resource divided and shared by a number of physical partitions created by means of, for example, partitioning function installed in a server system or the like.
2. Description of the Related Art
A server system for a backbone system requires high availability and flexible use of resources (hardware resources). In order to achieve these purpose, a conventional server system makes use of, for example, a physical partitioning function that divides the hardware resources into a number of physical partitions and shares the physical partitions and a partitioning function that arbitrary combine the physical partitions created by division by the physical partitioning function to thereby create a number of independent partitions. Consequently, these functions make the server system possible to use resources without being constrained by the hardware.
FIG. 6 is a diagram for explaining the physical partitioning function and the partitioning function executed in a conventional server system, and shows an example that implements and interacts functions for hardware resource allocation and for information distribution in accordance with properties of ASIC (Application Specific Integrated Circuit) and the firmware.
In the example shown in FIG. 6, server 200 includes a hardware management unit 201, and is capable to divide hardware resources, such as a memory, a PCI (Peripheral Component Interconnect) card and a chipset, into a number (m in the example shown in FIG. 6, where m is a natural number) of XPARs (Extended Partionings) 202-1, 202-2, . . . , 202-m by means of the physical partitioning function.
Each of XPARs 202-1, 202-2, . . . , 202-m is a physical partition that is formed by dividing hardware resources (modules) exemplified by SB (System Board)/IOU (Input Output Unit) and recombine the divided hardware resources into a partition configuration. Hereinafter, one particular XPAR is represented by a reference number 202-1, 202-2, . . . or 202-m as requiring discrimination of the particular XPAR from of the remaining XPARs, but an arbitrary XPAR is represented by a reference number 202.
Example shown in FIG. 6 divides the hardware resources such as ASIC 203 such that the divided hardware resources are incorporated in XPARs 202.
Further, in FIG. 6, a number of XPARs 202 are used in a number (n+1 in the example shown in FIG. 6, where n is an integer) of partition blocks P0 through Pn, and, for example, partition block P0 collectively utilizes XPARs 202-1 and 202-2; and partition block Pn utilizes XPAR 202-m with the aid of the partitioning function describe above.
Hereinafter, one particular partition block is represented by a reference number P0, P1, P2, . . . , or Pn as requiring discrimination of the particular partition block from the remaining partition blocks, but an arbitrary partition block is represented by a reference number P.
A partition block is a unit for OS (Operating System) 205 to operate, and each partition block P therefore should include at least one processor.
Hardware management unit 201 manages on/off of the power source and error information of the server 200, and includes a service processor, for example.
Also server 200 making use of the physical partitioning function and the partitioning function as shown in FIG. 6 has to accomplish accurate analysis and notification of failure information concerning a failure occurred at one of the hardware resources in the server 200 similarly to the case where server 200 inactivating the physical partitioning function.
When activating the physical partitioning function, server 200 manages and divides the hardware resources by, for example, ASIC 203, weighing reliability, mounting, a cost and congeniality with other functions, and provides firmware 204 with resource management information including such failure information. Firmware (F/W) 204, being executed for each partitioning block P, analyzes the failure information if required and manages the failure by sending the failure information to the upper layer such as OS 205, so that the hardware failure less affects on the partition block. That makes server 200 possible to flexibly extend the functions.
When the hardware resources are divided into physical partitions by use of the physical partitioning function, physical partitions created as the result of physical partitioning are classified into hardware resources independently used (hereinafter called monopolized resources) and hardware resources shared by other hardware resource (hereinafter called shared resources).    [Patent Reference 1] Japanese Patent Application Laid-Open Publication No. 2002-229806    [Patent Reference 2] Japanese Patent Application Laid-Open Publication No. 2004-62535
However, in detection of a failure on the ASCI level when activating the physical partitioning function which interconnects ASIC 30 and firmware 204, s physical partitions (XPARs 202) (where s is an integer of two or more, in the example shown in FIG. 6, s is two of XPARs 202-1, 202-2) created as the result of division of the descendant of ASIC 203 is assumed to be collectively operate as an independent partition block. If a failure occurs at a hardware resource (a shared hardware resource) shared by XPAR 202-1 and XPAR 202-2 or a failure originated in a monopolized resource or the like is propagated to XPAR 202-1 and XPAR 202-2, XPAR 202-1 and XPAR 202-2 each issue respective failure reports whereupon redundant failure reports are recorded in ASIC 203.
In other words, according to such a conventional failure processing method as activating the physical partitioning function, s redundant failure reports as many as the number of physical partitions created as a result of division are recorded in ASIC 30, and the firmware 204 analyzes the failure using the information of the recorded failure reports and excessively sends the failure to OS 205 and/or hardware management unit 201. As a consequence, that makes OS 205 and/or hardware management unit 201 impossible to manage the accurate number of failure occurrence and cannot be properly maintained. In addition, also firmware 204 problematically cannot manage accurate failure occurrences.
For example, Patent Reference 1 discloses a technique failure management occurs at hardware and software by use of a management agent and a management console that operate on OS in an open-system computer that runs two or more OSs.
The manner disclosed in Patent Reference 1 relates to a virtualization technique in which the entire functions for hardware resource allocation and information distribution are executed by software such as firmware. Namely, the reference discloses a method concerning virtual division of server hardware resource on the software level, but does not therefore apply to the physical partitioning function that physically divides a hardware resource. As yet another problem, the method of Patent Reference 1 increases load on the firmware and requires an additional guest OS to accomplish the virtualization.
In addition, since the method of Patent Reference 1 carries out management for failure occurrence by use of a management agent or a management console that operate on an OS, load on the OS increases and requirement for management console increases the manufacture cost for the server.
Still further, the server vender cannot ensure failure management because the management agent is operated on the OS by the user.
Patent Reference 2 relates to a failure processing method for a multi-processor system that uses a large-scale platform formed by an aggregation of a number of nodes. In this method, if a failure occurs at one of the nodes, the failure node notifies the service processor of failure occurrence which service processor notifies the service processor manager of the failure occurrence.
For this purpose, the patent has to provide one service processor for each node and requires a service processor manager collectively control these service processors, resulting in manufacture cost rise.