Many fault tolerant systems have been proposed in order to reduce the risks of processor failures using multiple processors. For example, one type of fault tolerant processor using multiple processors has been disclosed in U.S. Pat. No. 4,497,059 issued to B. Smith on Jan. 29, 1985. An extension of such technique has been proposed in a fault tolerant multiple processor system which utilizes additional processors attached thereto as described in U.S. Pat. No. 4,665,522 issued to J. Lala, et al. on May 12, 1987. Another form of multiprocessor fault tolerant system has been described in U.S. Pat. No. 4,015,246 issued to A. Hopkins, et al. on Mar. 29, 1977. Such systems utilize centralized fault tolerance techniques and suffer from a number of disadvantages which are typical of centralized systems.
For example, the system described in the Smith patent tends to have relatively limited information or data throughput and its applications are for that reason relatively limited also. The system described in the Lala et al patent tends to improve the throughput available with the system of the Smith patent but with a consequent loss in overall reliability of the system.
The multiprocessor system of Hopkins et al. patent also tends to have a higher throughput performance than the fault tolerant processor described in the Smith patent but suffers two major disadvantages. First of all, the system cannot be physically dispersed to any extent; that is, the processors, memories, and I/0 modules which represent the components of the multiprocessor system need to be in relatively close physical proximity for effective operation. Such requirement can be a relatively severe limitation when one wishes to utilize the system in applications where distributed processing at relatively remote locations is required. Secondly, the system described therein forces all information processing tasks to be run in a triplex redundancy mode. Even information processing tasks which have a relatively low criticality cannot be executed in a duplex redundant mode, or in a simplex, i.e., a non-redundant, mode. Moreover, tasks which have relatively high criticality cannot be executed at higher than a triplex redundancy level, e.g., in a quadruplex redundancy mode.
Thus, while the above fault tolerant systems are useful in some applications they are not available for use in the distributed fault tolerant processing sense as is required in many applications.
On the other hand, distributed information processing systems are available for use, for example, in a batch processing environment and can be used only for relatively low, or non-critical, non-fault tolerant and non-real-time, applications. Many commercial networks, for example, which interconnect a number of computers at locations remote from each other are now available so as to provide a form of distributed processing. However, such systems can fail and, in some cases, catastrophically fail, due to a single fault in any of the processors thereof or in the bus system which interconnects the processors. Thus, single-point failures can shut down an entire networked system. Moreover, such systems have relatively low reliability and, in general, as mentioned above, are generally not set up to perform in a real time mode.
In order to obtain both the advantages of distributed processing as well as the advantages of fault tolerance in an overall system, a recently proposed system uses a plurality of redundant processing sites each of which can comprise a plurality of redundant processors which are interconnected by the use of the same number of intersite buses. The processing sites can be physically dispersed so that they can be located relatively far from each other. The multiple processors at each site are loosely coupled, that is, they are not constrained to run in lock-step under control of the same frequency clock. The processors at each processing site can be operated either in a non-redundant, or simplex, mode or in a more complex redundancy mode, the complexity being arranged to be as high as the number of processors present at each site. The overall system is operated so that the level of redundancy need not be the same at all processing sites, the bus system permitting graded redundancies to be accommodated. The overall arrangement is such that the chance that a failed processor will be permitted to disable a bus is greatly reduced.
Such a system is described generally in the following documents: "Advanced Information Processing System," J. H. Lala, CDSL-P-1952, The Charles Stark Draper Laboratory, Inc., September, 1984 and "A Fault-Tolerant Processor To Meet Rigorous Failure Requirements, "J. H. Lala et al., CSDL-P-2705, The Charles Stark Draper Laboratory, Inc. July, 1986.
While such a system has been described generally in the aforesaid documents as providing desirable distributive, fault-tolerant operation, no effective technique has been proposed or devised for suitably managing communication among the processing sites thereof in a manner which appropriately arbitrates the access to an inter-processor communication network by the multiple processing sites involved. It is desirable to be able to provide such an effective network access contention scheme.