Distributed, fault-tolerant communication systems are used, for example, in applications where a failure could possibly result in injury or death to one or more persons. Such applications are referred to here as “safety-critical applications.” One example of a safety-critical application is in a system that is used to monitor and manage sensors and actuators included in the fields of automotive, aerospace electronics, industrial control, and the like.
Architectures considered for safety-critical applications are commonly time-triggered architectures where nodes use the synchronized time to coordinate access to common resources, such as the communication bus. One architecture that is commonly considered for use in such safety-critical applications is the Time-Triggered Architecture (TTA). In a TTA system, multiple nodes communicate with one another over two replicated high-speed communication channels using, for example, a time-triggered protocol such as the Time-Triggered Protocol/C (TTP/C).
Fault-tolerant protocols (e.g. TTP/C) that use time-division multiple access (TDMA) as the medium access strategy where each node is permitted to periodically utilize the full transmission capacity of the bus for some fixed amount of time called a TDMA slot. As long as each node uses only its statically assigned TDMA slot, collision free access the bus can be assured.
Typically, transmissions of messages by nodes in a TTP network are controlled by a schedule table which determines which node has permission to transmit for each TDMA slot, and also defines the starting time and duration of the TDMA slot. This starting time and duration defines a node's permitted transmission window. A node's transmitter starts to send its message after the start of its window, and should finish before it is over. Nodes without permission to transmit listen for transmissions when a TDMA slot begins until the duration has elapsed. The timing of when a node transmits and receives is controlled by a node's local clock that is synchronized to other nodes in the system, by a distributed clock synchronization algorithm. In practice, the perfect synchronization of all of the nodes' clocks is not possible so that the clocks for each node are slightly skewed from each other. Because of this, it is possible that a node's transmitter may begin to transmit a message before one or more of the receiving nodes are ready to listen. Similarly, it is possible for a node to continue transmission after the other nodes have stopped listening. Additionally, a degraded node may attempt to transmit well outside of its assigned window.
A centralized guardian has been conceived to limit the propagation of such failures. These Guardians (or central guardians) ensure that a degraded node transmitter cannot broadcast to the network outside its allotted window. At the beginning of a TDMA slot, after a predefined delay, the guardian opens a window which allows a node to transmit messages to the network. If the node is operating correctly, it will begin transmission shortly after the guardian's window opens and complete transmission before the window closes. Ideally, receiving nodes (i.e. listening nodes) begin listening at the beginning of the TDMA slot until the guardian's window closes. The guardian blocks transmissions from a node that does not occur within the transmission window.
One problem with the current state of the art for guardians is that realizations of guardian functions have been required to duplicate the protocol logic engine implemented at the nodes in order to have independent knowledge of the communication schedule and timing parameters, such as slot order, transmission start time, etc. Implementation of the protocol logic engine within the guardian has led to highly complex guardian designs. With the centralization of the guardian's roll in regards to network data flow, guardians themselves have become critical architecture components. The complexity of a guardian design is a significant issue with respect to the viability of a design in safety critical applications. For example, in some cases gate level failure analysis is required before a guardian design may be used for safety critical applications. In these cases, the complexity of performing a failure analysis for such a guardian has significant financial impact in terms of product development costs. In some applications guardian circuitry may be required to perform self-tests to ascertain its own health. The complexity of these self-tests is also directly related to the complexity of the guardian.
Another problem is that for some protocols, current guardian designs based on internally implementing protocol logic engines requires that guardian within a network be coupled together. Embodiments of the present invention eliminate this requirement.
It has further introduced the possibility of failure in the form of inconsistency between the guardian and the nodes it is protecting. Requiring the guardian to maintain knowledge of current or past states, in the form of transmission orders, leaves the implementation vulnerable to state upsets, which can be induced by environmental factors such as high energy neutrons.
For the reasons stated above and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the specification, there is a need in the art for a simplified guardian design.