1. Field of the Invention
The present invention relates to a computer system. More particularly, the present invention relates to a computer system that transmits and receives data by packet transfer between modules that it incorporates, and a communication method therefor.
2. Description of the Related Art
FIG. 18 is a block diagram showing the configuration of a conventional computer system that transmits and receives data by packet transfer between a plurality of modules that it incorporates.
The computer system of FIG. 18 has a configuration, wherein CPU modules 20a shown in the block diagram of FIG. 19 and I/O modules 30a (IO modules) shown in the block diagram of FIG. 20 are connected on a one-to-one basis, and in which the CPU modules 20a are connected with one another via a module interconnect 40.
Referring to FIG. 19, the CPU module 20a comprises a CPU, a main memory, and a CPU module controller 21a for controlling transmission and reception of data between the CPU and the main memory. Referring to FIG. 20, the I/O module 30a comprises an I/O module controller 31a for controlling the function of bridging between itself and I/O adapters on the end.
Some conventional computer systems with similar configurations include additionally a swap function that allows any of the CPU modules 20a to be inserted or removed during system operation (i.e., hot swap function) in order to meet high availability requirements. Thanks to this function, if any of the CPU modules 20a fails, the system as a whole can avoid going down and allows the failing CPU module 20a to be replaced with a good one.
Conventional computer systems to which functions against failures as mentioned above are added, however, have one problem. Namely, in order to hot-swap a failing CPU module 20a, the I/O modules 30a under the failing CPU module 20a must also be removed even though they are operating normally and thus are available for connection for operation.
One conventional art to address this problem provides a computer system that is configured to connect a set of I/O modules 30a and a set of CPU modules 20a via a switching module 10, as illustrated in FIG. 21. By adopting such configuration, flexible connection between I/O modules 30a and CPU modules 20a can be realized. For example, when a failing CPU module 20a is disconnected from the system, the other CPU modules 20a can access the I/O modules 30a under the disconnected CPU module 20a by accessing through a switching module 10.
Similarly, as illustrated in FIG. 22, a large-scale computer system can be realized by connecting a set of CPU modules 20a and a set of I/O modules 30a via a network 50.
As described above, in conventional computer systems, a failure of a single or a small group of CPU modules 20a may often affect the functions throughout the system, making it difficult to realize high availability.
Similarly in conventional computer systems as illustrated in FIGS. 21 and 22, this problem tends to affect system-wide performance if a packet lost or data error occurs on the switching module 10 or the network 50. In CPU-I/O transactions, which is critical for systems to ensure normal operation, for example, some extreme cases have been reported in which an error in a single packet precluded the continuous operation of the entire system, causing a system down time.
The switching module 10 or the network 50 used in such a system suffers from a higher failure rate as the distance of connection becomes longer, suggesting that large-scale systems will encounter more failures attributable to these elements. Thus, in order to attain high availability, it is necessary to realize a new function that can ensure the continuous operation of the entire system even when a failure occurs along a path between modules.