The invention relates to fault tolerant computing systems and in particular to fault tolerant systems where application programs are synchronized at the processor level.
There is a need for a low cost but high performance fault tolerant computing system that does not greatly increase the difficulty of application software design. It is generally recognized that there is a need to employ digital computers in applications in which improper operation could have severe consequences. For example, a sophisticated flight hazard warning system has been developed for aircraft which utilizes a number of independent warning systems including a ground proximity warning system, a wind shear detection system and a collision avoidance system. This particular system is generally described in U.S. Pat. No. 6,002,347, filed Apr. 23, 1997 and entitled: xe2x80x9cIntegrated Hazard Avoidance Systemxe2x80x9d, and is incorporated herein by reference. In the preferred embodiment described therein, a central computer, which may include multiple processors for redundancy, receives via various input/output (I/O) modules various types of flight data useful for anticipating and warning of hazardous flight conditions. Such information may include but is not limited to: barometric altitude, radio altitude, roll and pitch, airspeed, flap setting, gear position, and navigation data. This information is communicated to the central computer via a data bus.
For such an integrated warning system to provide warnings with a high degree of integrity, the data operated upon and instructions issued by the central computer must be accurate. A bus architecture to transfer data between each of the I/O modules must be accurate. A bus architecture to transfer data between each of the I/O modules in an orderly manner must therefore exist. Data placed on the bus must also be accurate and without error. Also, it is important to ensure, to the extent possible, that the individual systems execute the warning programs correctly.
There have been various approaches to solving these problems. For example such a system is described in ARINC Specification 659 entitled Backplane Data Bus published on Dec. 27, 1993 by Aeronautical Radio, Inc. In this system the bus includes four data lines and has a pair of Bus Interface Units(xe2x80x9cBIUxe2x80x9d)for each processor or node on the data system where each BIU is connected to two data lines in the bus. Data is transferred according to a time schedule contained in a table memory associated with each BIU. The tables define the length of time windows on the bus and contain the source and destination addresses in the processor memory for each message transmitted on the bus. These types of systems also use for some applications two processors that operate in a lock-step arrangement with additional logic provided to cross-compare the activity of the two processors. The two processors, each with its own memory, execute identical copies of a software application in exact synchrony. This approach usually requires that the two processors must be driven by clock signals that are synchronized.
Although such systems have high data integrity and provide for fault tolerant operation, they have a number of disadvantages. For example the use of tables having data source and destination addresses for each application program in the processor memory makes it difficult to reprogram the system for new applications because each table in the system must be reprogrammed. In addition, the use of two processor operating in lock-step reduces the flexibility of the system since it is not possible to run two different programs on the processors at the same time.
This invention provides a way of using hardware facilities that are part of commercially available microprocessors together with control software to implement a fault tolerant computing system. Using the technique of this invention, a robust fault-tolerant computing system can be built. Application software that executes on the system can remain simple because it does not need to be aware of the measures taken to achieve the fault-tolerant characteristics of the system, that is, no special redundancy management code is built into application code. The redundancy management code is entirely at the operating system level. In addition, the application software does not need to adhere to restrictive design rules to allow the system""s fault detection and containment mechanisms to work. The invention thus provides a way to separate the concerns of fault tolerance mechanisms and application logic. This makes it much easier and therefore less expensive to build robust fault tolerant computing systems.
This invention has commercial value because it allows strong and robust fault tolerant computing systems to be built with low cost commercial off the shelf components. For example, by using counters or event monitors that are built into the microprocessor chips, it is possible to count the instructions being executed in an application program so as to cause the application programs to execute in congruent frames. Therefore, systems built using this technology will have a substantial advantage over systems built with fault tolerant architectures that require custom electronics, custom integrated circuits or tricky and expensive application software design techniques.
This invention is also valuable because it uses technology that will be enhanced and extended as part of the natural growth path of microprocessor technology. Future microprocessors and microcontrollers now on the drawing boards having hardware that can be used to monitor the execution of application programs will almost certainly increase the advantage of this approach to fault tolerant systems.