1. Field of the Invention
The present invention relates to an interconnection device for interconnecting a plurality of information processing modules, and may be applied to a error control apparatus for controlling errors detected in the interconnection device.
2. Description of the Related Art
A configuration in which a large scale saver system is realized by interconnecting a plurality of information processing modules is known as a conventional technique. In such a configuration, each of the information processing modules has a CPU and a memory device, and can perform information processing corresponding to given programs. Also, it is possible to expand the server system by increasing the number of the information processing modules to be interconnected.
A plurality of the information processing modules are interconnected by using a crossbar module serving as an interconnection device. The crossbar module relays/transmits information (packets containing information in this example) between the information processing modules.
In a server system of the above configuration, when a packet is sent out from an information processing module that has failed or when a bus connecting an information processing module and a crossbar module is disconnected, invalid or inadequate packets (referred to as error packets hereinafter) is input into the crossbar module. However, many of the conventional crossbar modules do not have a function of handling error packets. Accordingly, there is a probability that the influence of error packets extends to the circuit elements and/or other information processing modules in the crossbar module so that the subsequent operations can not be executed. In such a case, the entire server system (or many of the circuit elements in the server system) has to be once halted, and has to be again activated after examining and recovering the failed portion.
A configuration in which a crossbar module has a function of detecting error packets is also known. In this system, when an error packet is detected, the operation of the information processing module that has sent out the error packet is halted by means of software processing. However, in this configuration, the error packet itself is transferred without being discarded, and the influence of the error may extend to a wider area. Also, there is a probability that another error packet may be sent out before the operation of the information processing module that has sent out the first error packet is halted.
Many of the recent large scale server systems employ the partitioning function by which computer resources such as CPUs, memory devices, and the like are classified into a plurality of groups (hereinafter referred to as partitions) so that the server system operates as virtually independent plural computers. This partitioning function is realized by classifying, for example, a plurality of information processing modules into groups. However, in the conventional technique, there is a probability that the influence of error packets extend over partitions so that the entire server system (or many of the circuit elements in the server system) has to be halted. In such a case, the stable operation of the server system is inhibited for a long time period.