1. Field of the Invention
The present invention generally relates to a data processing system and, more particularly, to a data processing system having at least one memory provided in common for a plurality of clusters, each cluster having a processor. Further, the present invention is concerned with a method for controlling such a data processing unit.
2. Description of the Related Art
Generally, recent data processing systems tend to be provided with at least one common memory provided in common for a plurality of clusters. This tendency results from the fact that the processing speed of a single processor can no longer be greatly increased and from the fact that a data processing system having enhanced reliability is required. Normally, important data used in common for a plurality of clusters is stored in the common memory, and thus in many cases, two common memories which form a duplex memory are used in order to enhance the reliability of the data processing system.
FIG. 1A shows a related data processing system, which has two common memories 1-1 and 1-2, each provided in common for clusters 2a and 2b. The common memories 1-1 and 1-2 have a control table 5. The access to each of the memories 1-1 and 1-2 is controlled by using the control table 5. As shown in FIG. 1A, the control table 5 has information showing whether or not a storage area 1 of the common memory 1-1 and a storage area 2 of the common memory 2-2 is allowed to be accessed.
If the cluster 2a detects a fault which has occurred in the area 1 of the common memory 1-1, the cluster 2a writes, into the control table 5, information showing that the area 1 cannot be used. The other cluster 2b refers to the control table 5 before starting an accessing operation, and determines whether or not use of any area is inhibited. In the case being considered, since use of the area 1 of the common memory 1-1 is not allowed, the cluster 2b is allowed to access only the area 2 of the common memory 1-2. In the above-mentioned way, use of the defective area 1 of the common memory 1-1 is inhibited, and the area 2 of the remaining common memory 1-2 can be used in common for the clusters 2a and 2b.
FIG. 1B shows another related arrangement of the data processing system having the common memories 1-1 and 1-2. The clusters 2a and 2b have control tables 5-1 and 5-2, respectively, in place of the control table 5 shown in FIG.1A. When a fault occurs, the clusters 2a and 2b start to communicate with each other. For example, when the cluster 2a detects a fault which has occurred in the common memory 1-1, the cluster 2a writes information showing that the area 1 is inhibited from being accessed into the control table 5-1, and informs the cluster 5-2 of such information. Then, the cluster 2b writes the information showing that the area 1 is inhibited from being accessed into the control table 5-2. During the above-mentioned operation, it is necessary to stop the normal processes of all the clusters 2a and 2b. In the above-mentioned way, one of the clusters 2a and 2b detects any failure in the areas 1 and 2, and both of the clusters 2a and 2b use the remaining common memory 1-2.
However, the related system shown in FIG. 1A has the following disadvantages. First, it is necessary to refer to the control table 5 provided on the common memory side and determine whether or not the requested access is allowed. A long time and a complex logical control are needed to complete this operation. Second, the input path which connects the common memory 1-1 set to the access inhibiting state is not physically disconnected from the clusters 2a and 2b. Thus, there is a possibility that the common memory 1-1 will be accessed. Third, if a failure occurs in the control table 5, or a contradiction in information stored in the control table 5 takes place, the system may malfunction.
The related system shown in FIG. 1B has the following disadvantages. First, it is necessary for the defective cluster 2a to inform the cluster 2b of the occurrence of a failure in the area 1 of the common memory 1-1. This requires complex processing. Further, the system shown in FIG. 1B has the disadvantages as described above with regard to the system shown in FIG. 1A.
On the other hand, if a failure occurs in one of the clusters 2a and 2b, the system operates as follows. For example, as shown in FIG. 2, if the cluster 2b detects a failure which has occurred in the cluster 2a, the cluster 2b writes information showing that the cluster 2a is down or has failed into a corresponding area of the control table 5 of a common memory 1A. When the cluster 2a starts the access operation, it refers to the control table 5 in the common memory 1A, and recognizes that it is inhibited from being accessed. Thus, the cluster 2a stops the operation of its own processor (not shown).
However, if the cluster 2a having a failure fails to access the control table 5 provided in the common memory 5, it may destroy data stored in the common memory 5. Further, if data in the control table 5 is damaged, the clusters 2a and/or 2b may malfunction. Furthermore, each time each of the clusters 2a and 2b try to access the common memory 1A, it is necessary for each cluster to determine whether or not it itself is held in the access inhibiting state. Such a determination requires a large amount of time and a complex logical control. Further, it takes a long time to stop the defective cluster after it has been detected.