1. Field of the Invention
The present invention relates to a data processing apparatus configuring a unit for enabling a connection of a system board equipped with one or more central processing unit (CPU) and an input/output (IO) unit for connecting a peripheral apparatus.
2. Description of the Related Art
Recent years have been witnessing some computers (i.e., data processing apparatuses) configured to enable an incorporation of a plurality of physically separate units. Such units usually include a system board (SB) equipped with a CPU and memory, and an IO unit equipped with IO devices such as a hard disk apparatus and a peripheral component interconnect (PCI) slot. The reason for providing such a unit is to allocate a CPU resource and a memory resource flexibly in response to a condition. That is, to gain advantage of utilizing these resources effectively. A computer that is configured as such is equipped with one or more of the system boards and IO units, respectively. A crossbar is used for interconnecting these units. Such configured computer allows a division into one or more system ports and IO units, respectively, as one independent system. Such a dividable “independent system” is called a “partition”.
FIG. 1 is a diagram showing a configuration of a computer connecting a plurality of units by a crossbar. As shown in FIG. 1, one or more system boards 1 and IO units 2 are both connected to two global address crossbars (abbreviated as “address crossbar” or “GAC” hereinafter) 3 and four global data crossbars (abbreviated as “data crossbar” or “GDX” hereinafter) 4, respectively. A management board (MMB) 5 is a dedicated management unit that is connected to each of the units 1 through 4 by way of SM bus.
The two address crossbars 3 carry out the same request controls simultaneously, thereby dualizing the address crossbars in terms of hardware, thus accomplishing a high reliability. This specification calls an operation mode for dualization as “dualization mode” for convenience. The reason for providing the four data crossbars 4 is that a large volume of data is transmitted at once.
Incidentally, “#0” and “#1” are noted on the two address crossbars 3, respectively. Therefore, when addressing only one of the two crossbars 3, “#0” or “#1” will be attached to the component number. This method of notation is the same for other component addresses herein.
The two crossbars 3 operate synchronously with each other. As for the data crossbars 4, the two data crossbars 4#0 and 4#2, two data crossbars 4#1 and 4#3 operate synchronously with each other, respectively.
Mechanisms for storing data and control information, such as memory, buffer, and queue mounted on the address crossbars 3 are configured to add an Error Correcting Code (ECC) or parity, thereby recognizing an occurrence of an uncorrectable error. Also configured is to recognize an error occurrence such as a freeze by monitoring an operation of other parts. In the case of an error occurrence during an operation in a dualization mode, a conventional computer is configured to respond to as described in the following.
FIG. 2 is a flow chart showing a flow of process carried out by the respective parts of a conventional computer in the case of an error occurring in the address crossbar 3#1. The next is a specific description on an operation of individual parts including the crossbar 3#1, in which the error has occurred, by referring to FIG. 2. The individual parts are divided into four parts, i.e., a system board 1 and IO unit 2 (noted as “SB/IOU” in the drawing), an address crossbar 3#0 (noted as “GAC #0” in the drawing), an address crossbar 3#1 (noted as “GAC 3#1” in the drawing) and a management board 5 (noted as “MMB” in the drawing) according to the configuration shown by FIG. 2.
Recognizing (i.e., detecting) an error occurrence, the address crossbar 3#1 notifies the management board 5, each system board 1 and each IO unit, respectively, of the error occurrence (step SA 1; likewise noted hereinafter). The address crossbar 3#1 transmits a signal (i.e., a GAC separation signal) to each system board 1 and each IO unit 2 requesting for logically separating the address crossbar 341 from the system, followed by stopping an operation (SA 2).
Having received the GAC separation signal, each system board 1 and each IO unit 2 respectively carry out an operation (i.e., process) of separating the address crossbar 3#1 in which the error has occurred (SC 1). The same operation continues thereafter except for not using the separated crossbar 3#1 (SC 2).
The management board 5 reflects the notification to the system control, including making the other address crossbar 3#0 continue the same operation as prior to the error occurrence by not notifying it of the error occurrence in the address crossbar 3#1.
As such, when an error occurs in one of the dualized address crossbar 3, the error-occurred address crossbar 3 is no longer used, and thus being separated from the system. This is in consideration of maintaining a reliability of data. Accordingly, the configuration is such that an address crossbar 3 operating in a dualization mode is made to stop operating at an error occurrence (refer to FIG. 2) therein.
The dualization of the address crossbar 3 achieves a higher level of reliability. If an error occurs in one of the two address crossbars 3, the system can be operated by using the other crossbar 3. There is, however, a possibility of an error occurrence in the other as well. If such an error occurs, the other is also stopped by the error occurrence, resulting in a system stoppage.
Some of errors occurring in the address crossbar 3 may not necessarily have to stop the crossbar 3 per se. There is many a case of an occurrence of partial error influencing only between specific units. Therefore, in the case of setting a mode operating a unit independently (notes as “singularized mode” hereinafter), the configuration is such as to operate a part uninfluenced by an error occurrence, and stop only a part necessary to stop due to the error occurrence. In order to achieve a higher availability of the system, it is also conceivably important to focus such an aspect and improve an error resistance.
Reference documents include a Laid-Open Japanese Patent Application Publication Nos H09-179838 and a Registered Japanese Patent No. H07-82479.