1) Field of the Invention
The present invention relates to a multiprocessor system made up of a plurality of processor modules and, more particularly, to a multiprocessor system having improved fault tolerant performance as a result of duplicating the connection bus between processor modules (i.e., as a result of providing a connection bus with redundancy).
2) Description of the Related Art
The performance of a CPU itself is approaching a limit, and, for this reason, so-called multiprocessor systems have been developed which involve a plurality of processors connected together in order to improve system performance.
Such multiprocessor systems are basically categorized into two types depending on the connection configuration of a plurality of CPUs.
One type is a so-called TCMP (Tightly Coupled Multiprocessor) in which a plurality of processors (CPUs) are tightly coupled together. In such a TCMP, the CPUs share a program and data using the same memory. Communication between processors (CPUs) is effected through the memory.
The other type is a so-called LCMP (Loosely Coupled Multiprocessor) in which a plurality of processors are loosely coupled together. In such an LCMP, each of the processors has its own memory. Communication between the processors is effected by transmitting messages through a communication path, for example.
The TCMP is more efficient because there is less overhead during communication between the processors, but the performance of the memory bus tends to become a bottleneck while increasing the number of processors. In the LCMP, overhead is relatively large because all messages must be transmitted for communication between the processors. However, a larger number of processors compared to the TCMP can be used.
To improve system performance of the LCMP configuration, it is commonly necessary to (1) improve the performance of each processor and (2) increase the number of processors coupled together. In an attempt to meet the demand (1), in addition to improved performance of the CPU, TCMP configuration has been employed in the processor, and a large-scale cache and/or memory has been incorporated into the processor. On the other hand, in an attempt to meet demand (2), since an increase in the number of processors coupled together depends on the performance of the system bus coupled to the processors, the number of signal lines of the bus and the speed of the bus clock have been increased.
In order to improve the performance of each processor as previously mentioned, it is necessary to increase the size of the processor hardware significantly, thereby resulting in an increased physical size of the processor. The increase in the physical size of the processor and the number of processors results in a large overall length of the bus, which in turn makes it very difficult to increase the clock speed for the purpose of improving bus performance.
For these reasons, there is a growing demand for a divided system bus in which a bus extension function is added to cause a plurality of physically compact buses to operate as a single system bus. The present invention is applicable to a multiprocessor system having the LCMP configuration and relates to the control of the status of a divided system bus and to recovery from failures in the divided system bus.
FIG. 13 is a block diagram showing an example of the configuration of a common multiprocessor system. The multiprocessor system, shown in FIG. 13, is of the LCMP type, and it is made up of a plurality of processor modules 1 (hereinafter occasionally referred to as PM) connected together through a duplicated system bus 2. Each of the PMs 1 has a configuration as shown in FIG. 15, and the detailed configuration thereof will be described later.
The system bus 2 is made up of two bus lines (a main bus line and a spare bus line), that is, physical buses 2A and 2B. The physical buses 2A and 2B are connected to each of the PMs 1 through bus connecting sections 1A and 1B belonging to each PM.
The physical buses 2A and 2B are respectively connected to bus control mechanisms 3A and 3B which control the states of the physical busses. Bus status notification lines 4A and 4B for notifying the respective states (e.g. a HALT state) of the physical buses 2A and 2B are connected between the bus control mechanisms 3A and 3B and the PMs 1 through the bus connecting sections 1A and 1B of each PM. Each of the bus control mechanisms 3A and 3B has a configuration as shown in FIG. 17, and each of the physical buses 2A and 2B has a configuration as shown in FIG. 18. Their detailed configurations will be described later.
With the configuration as set forth above, the states of the physical buses 2A and 2B are controlled by the bus control mechanisms 3A and 3B. For example, when the physical bus 2A becomes unable to perform normal operations while being used as a main bus, the bus control mechanism 3A halts the physical bus 2A. Further, the bus control mechanism 3A notifies all the PMs 1 of the halted state of the physical bus 2A through the bus status notification line 4A.
Upon receipt of the notification of the halted state of the physical bus 2A, each of the PMs 1 switches from the physical bus 2A being used as the main bus to the physical bus 2B serving as the spare bus by a function of an operating system (hereinafter occasionally referred to as OS). Communication between PMs 1 is then carried out through the physical bus 2B.
FIG. 14 is a block diagram showing an example of an expanded system bus in a multiprocessor system. The multiprocessor system shown in FIG. 14 has the same basic configuration as the multiprocessor system shown in FIG. 13. The reference numerals used in FIG. 13 are used to denote corresponding elements, and the explanation thereof will be omitted here for brevity.
In FIG. 14, adaptors 5A and 5B (hereinafter occasionally referred to as ADPs) are respectively connected to the physical buses 2A and 2B. The ADPs 5A and 5B perform I/O control operations for a hard disk or for a communication line. Each of the ADPs 5A and 5B has a configuration as shown in FIG. 16, and this configuration will be described in detail later.
Bus extender mechanisms 6A and 6B are respectively connected to the physical buses 2A and 2B. Generally, in order to enhance the adaptor function of the multiprocessor system bus, I/O physical buses 8A and 8B, (which are duplicated) respectively having adaptors (ADP) 9A and 9B, are connected to the bus extender mechanisms 6A and 6B through bus extender mechanisms 7A and 7B belonging to the I/O physical buses 8A and 8B.
In other words, as a result of the connection of the system bus 2 with the I/O physical buses 8A and 8B through the bus extender mechanisms 6A, 6B, 7A, and 7B, it becomes possible for each PM 1 to utilize the I/O control function of the adaptors 9A and 9B connected to the I/O physical buses 8A and 8B.
As shown in FIG. 14, the adaptor function for controlling an I/O device is generally expanded by the bus extender mechanisms 6A, 6B, 7A, and 7B. In the configuration shown in FIG. 14, the system bus 2 is expanded for only one system bus (that is, only for the pair of I/O physical buses 8A and 8B). However, the system bus 2 may be expanded to a plurality of pairs of I/O physical buses through the bus extender mechanisms 6A and 6B.
When any failures arise in the expanded system bus, the failures will be overcome in the following manner:
For example, in the.event that the physical bus 2A of the system bus 2 becomes unable to perform normal operations while being used as the main bus, the bus control mechanism 3A halts the physical bus 2A and notifies all the PMs 1 of the halted state of the physical bus 2A through the bus status notification line 4A.
Upon detection of the halted state of the physical bus 2A through the bus status notification line 4A, the OS of each PM 1 switches from the physical bus 2A being used as the main bus to the physical bus 2B serving as the spare bus. Thereafter, the OS requests the operator to replace the currently used bus control module (the bus control mechanism 3A) with a new bus control module. After the failures have been overcome as a result of processing performed by the operator, the physical bus 2A is released from the halted state. The OS then checks the recovery of the bus control module, and resumes use of the physical bus 2A.
On the other hand, when the bus extender mechanism 7A of the expanded I/O physical bus 8A detects abnormality, the operation of the bus extender mechanism 7A comes to a halt after informing the bus extender mechanism 6A of this abnormal state. In this state, when the PM 1 accesses the expanded I/O physical bus 8A, the bus extender mechanism 7A sends a response, representing the halt of the I/O physical bus 8A, back to the PM 1.
Based on the response, the OS of the PM1 recognizes the abnormality of the bus extender mechanism 7A belonging to the I/O physical bus 8A, and switches the communication path to the I/O physical bus 8B. Thereafter, the OS encourages the operator to replace the bus extender mechanism 7A that caused the failure. The halted state of the expanded I/O physical bus 8A itself does not affect the states of the physical buses 2A and 2B, and therefore the physical bus 2B and the normal I/O physical bus 8B are usable. The OS of each PM 1 can check the recovery of the expanded I/O physical bus 8A from the failure by periodically polling the I/O physical bus 8A.
FIG. 15 is a block diagram showing an example of the configuration of a general processor module (PM). As shown in FIG. 15, each PM 1 comprises an MPU 1C, a local memory ID, and bus connecting sections 1A and 1B. These elements 1A to 1D are connected with each other through internal buses AB (address bus), DB (data bus), and CONTROL (control signal lines).
Since the PM 1 is connected to the physical buses 2A and 2B and the bus status notification lines 4A and 4B, the PM 1 is provided with the two bus connecting sections 1A and 1B. As will be described later with reference to FIG. 18, the physical buses 2A and 2B are respectively comprised of an individual signal line 2a for a bus request signal BRQ (Bus Request) #n! and an individual signal line 2b for a bus grant signal BGR (Bus Grant) #n!, in addition to a tag section (TAG) and a data section (DATA).
FIG. 16 is a block diagram showing an example of the configuration of a general ADP. As shown in FIG. 16, as with the PM 1, the ADPs 5A and 5B are respectively comprised of an MPU 5a, a local memory 5b, a bus connecting section 5c, and a controller 5d for controlling a variety of inputs and outputs. Similarly, these constituent elements 5a to 5d are connected together through the internal buses AB, DB, and CONTROL. The ADPs 5A and 5B are only connected to the physical buses 2A and 2B, respectively. For this reason, each of the ADPs 5A and 5B is provided with only one bus connecting section 5c.
FIG. 17 is a block diagram showing an example of the configuration of a general bus control mechanism. As shown in FIG. 17, the bus control mechanisms 3A and 3B are respectively comprised of an arbiter 3a for controlling the rights to use the physical buses 2A and 2B, a status control section 3b which controls the states of the physical buses 2A and 2B, halts the physical buses 2A and 2B when abnormal states arise in them and informs each of the PMs 1 of the halted state of the physical buses 2A and 2B through the bus status notification lines 4A and 4B, a command execution section 3c which responds to access from the PM1 by executing processing corresponding to a command from the PM 1, and a transceiver section 3d which receives data (a command or the like from each PM 1) from the physical buses 2A and 2B and transmits data to the physical buses 2A and 2B.
FIGS. 17 and 18 show an example of the multiprocessor system comprising PMs 1 in the number of "n", from #1 to #n. The individual signal line 2a is connected between each of the PMs 1 and the arbiter 3a, and the arbiter 3a receives bus request signals BRQ#1 to BRQ#n from the PMs 1. Further, the individual signal line 2b is connected between the arbiter 3a and each of the PMs 1, and bus grant signals BGR#1 to BGR#n are output to the PMs 1.
FIG. 18 is a block diagram for explaining an example of the configuration of a general physical bus. As shown in FIG. 18, the PMs 1 (#1 to #n) are coupled together through the bi-directional physical buses 2A and 2B each composed of the tag section (TAG) and the data section (DATA). The bus control mechanisms 3A and 3B possess the arbitration functions (the function of the arbiter 3a shown in FIG. 17) for the physical buses 2A and 2B, and control the rights to use the physical buses 2A and 2B.
This example adopts a concentrated arbitration method, and the bus request signal BRQ for requesting the right to use the physical bus 2A (or 2B) is transmitted from each PM 1 to the bus control mechanism 3A (or 3B) through the individual signal line 2a. The bus grant signal BGR for conferring the right to use the physical bus 2A (or 2B) is transmitted from the bus control mechanism 3A to the PM 1 through the individual signal line 2b.
FIG. 19 is a timing chart for explaining the operation of a general system bus. FIG. 19 shows operations of the system bus comprising the issue of a command from the module (PM) #1 to the module (PM) #2 and the subsequent return of a reply from the module #2 to the module #1.
The module #1 issues the bus request signal BRQ #1 for data transmission. The arbiter section (see reference numeral 3a shown in FIG. 17) receives that bus request signal BRQ#1 and provides the module #1 with the bus grant signal BGR #1. In the module #1, a flip-flop BGRMF for retaining a grant signal receives the grant signal BGR #1 and starts bus transmission. Bracketed symbols in FIG. 19 represent the type of each flip-flop.
The module #1 has previously set transmission data in a flip-flop DBFO dedicated for outputting data onto the bus, and transmits the transmission data set in the flip-flop DBFO to the data bus during a period in which the flip-flop BGRMF is receiving the bus grant signal BGR #1.
In the example shown in FIG. 19, three words are transmitted. In the tag section TAG shown in FIG. 19, "F" represents First (the first word), "M" represents Middle (a middle word), and "L" represents Last (the last word). Further, in the data section DATA, "C" represents Command (a command), "A" represents Address (an address), and "D" represents Data (data).
The transmission data output onto the bus are received by all the modules connected to the multiprocessor system. Each of the modules checks a designation address included in the command, and fetches only the command addressed to itself. In FIG. 19, the flip-flop DBFI of the module #2 dedicated for inputting data from the bus receives the data transmitted from the module #1, and the received data are fed to the inside of the module #2.
Upon receipt of the command, the module #2 analyzes and executes that command, and then generates a reply comprising response data. That reply data are set in the flip-flop DBFO dedicated for outputting data to the data bus, and the module #2 issues a bus grant signal BGR #2 to the arbiter section. When the arbiter section issues the bus grant signal BGR #2 to the module #2, the module #2 sends the reply data set in the flip-flop DBFO to the data bus during a period in which the flip-flop BGRMF is receiving the grant signal BGR #2.
In the tag section TAG shown in FIG. 19, "S" represents Single (only one word). Further, "R" in the data section DATA represents Reply (a reply). The reply data on the bus are also received by all the modules. However, only the module #1 receives the reply data by the address included in the reply data. When the flip-flop DBFI of the module #1 receives the reply data, one data transmission of the module #1 is completed.
However, in the previously mentioned general multiprocessor system, the PMs 1 are connected to the same physical buses 2A and 2B (the system bus 2), and hence all the PMs 1 always receive the same bus status recognition.
As previously mentioned, in order to increase the speed of the system bus clock of the multiprocessor system, it is necessary to reduce the lengths of the physical buses 2A and 2B. As a result of this, it becomes necessary to divide and combine the physical buses 2A and 2B, that is, it is necessary to extend and connect a plurality of compact multiprocessor systems through the bus extender mechanism.
FIG. 14 shows an example of a conventional expanded system bus. The example shown in FIG. 14 includes an adaptor function which is added for extension through the bus extender mechanisms 6A and 6B. This example is different from such a general multiprocessor system as shown in FIG. 13 in which the system bus 2 is divided, that is, the multiprocessor system itself is connected to another multiprocessor system for extension.
When the system bus 2 is divided (that is, when the system bus 2 is connected to another multiprocessor system through the bus extender mechanism for extension), the system employing such a divided system bus is fundamentally different from the system shown in FIG. 13 in that the PMs are connected to different physical buses, thereby resulting in an inconsistency in the bus status recognition. Such an inconsistency in the recognition not only complicates processing but also makes it impossible to ensure normal operation of the system.
In other words, the operating systems of the PMs recognize the resources differently, and hence it becomes necessary to carry out complicated control (control of resources) by retaining a plurality of control tables. Moreover, the operating systems are resources which have already been constructed, and they are not constructed based on the assumption that inconsistencies will arise in the states of the system buses. Therefore, it is necessary to fundamentally reconstruct the operating system in order to change control (control of resources) associated with the division of the system bus. In this way, the change of the system bus significantly affects the operating system.