In general, the present invention relates to communication systems based on time triggered message exchange incorporating measures for error containment in the time domain using a bus guardian for each node of the communication system.
In particular, the present invention refers to a method for monitoring a communication media access schedule of a communication controller of a communication system by means of a bus guardian. The communication system comprises a communication media and nodes connected to the communication media. Each node comprises a communication controller and a bus guardian assigned to the communication controller. Messages are transmitted among the nodes across the communication media based on a time triggered communication media access scheme. Of course, a node may comprise further components, such as a host controller.
Further, the invention refers to a computer program, which is able to run on a computer, in particular on a microprocessor.
Furthermore, the present invention refers to a bus guardian assigned to a communication controller of one of a number of nodes connected to a communication media. Messages are transmitted among the nodes across the communication media based on a cyclic time triggered communication media access scheme. The one node comprises a communication controller and the bus guardian. The bus guardian has means for monitoring the communication media access schedule of the communication controller.
Moreover, the invention refers to one of a number of nodes connected to a communication media. The node comprises a communication controller and a bus guardian assigned to the communication controller. Messages are transmitted among the nodes across the communication media based on a cyclic time triggered communication media access scheme. The bus guardian has means for monitoring the communication media access schedule of the communication controller.
Finally, the present invention refers to a communication system comprising a communication media and nodes connected to the communication media. Messages are transmitted among the nodes across the communication media based on a cyclic time triggered communication media access scheme. Each node comprises a communication controller and a bus guardian assigned to the communication controller. The bus guardian monitors the communication media access schedule of the communication controller.
In time-triggered communication systems, which can be used for message exchange in safety-critical applications in vehicles, a situation in which one of the nodes due to a local malfunction temporarily or permanently impedes communication between the nodes of the communication system (so called “babbling idiot”), is not tolerable. The function of a bus guardian is known in the art from communication systems which have been suggested for X-by-wire-applications, for example a FlexRay-communication system.
For such time-triggered communication systems a bus guardian is proposed to avoid that a communication controller of one of the nodes of the communication system continuously sends data and thereby blocks the communication media so no other communication controller of another node can transmit data (the so-called “babbling idiot” failure). Apart from the communication controller, the bus guardian constitutes a second source for the generation of a communication media access control signal. The bus guardian derives a corresponding control signal from an independent set of configuration data. The bus guardian enables communication media access for the communication controller it is assigned to only for specific time slots, for which the communication controller is allowed to transmit data across the communication media.
The bus guardian is configured with a communication media access schedule. By means of this configuration data the bus guardian can enable access to the communication media for certain time slots, for which a data transmission by the communication controller the bus guardian is assigned to, is expected. For the remaining time slots access to the communication media is inhibited.
In a fault-tolerant communication system often methods for an initial synchronization of a certain method for media access (for example: TDMA—Time Divisional Multiple Access) are implemented, in which more than one node of the communication system can make an attempt for an initial synchronization after power on. In this phase (so called schedule setup) it is attempted to synchronize all available nodes in such a way that they act on a global communication scheme, that defines when the nodes can occupy the communication media exclusively. If this synchronization was successful the established time scheme for the nodes will not change anymore. Each node operates in the so called normal mode. Only a critical operating situation or an intended shut down of the communication among the nodes can lead to a cessation of communication of the node. A change in the access behavior is not permitted.
However, during a startup phase for various reasons it can happen that a communication controller interrupts the execution of a communication schedule and resumes it later on. This incident in the following is referred to as schedule-reset. In contrast to the normal mode described above, the bus guardian must tolerate this incident during a schedule setup phase. Caused by a collision of two or more nodes, which try to achieve an initial synchronization of all nodes at the same time, a situation arises in which all but one node after detection of the collision scenario have to retreat. The node which does not have to retreat (remaining node) could be a node which first sends a regular frame. Alternatively it could be that node which sends the regular frame last or which sends a regular frame of a certain length.
When retreating, a node abandons a transmitting process based on its own local communication time schedule and adopts the time schedule proposed by the remaining node. Then, the chronological location of the own time slots is adjusted to this time schedule. Thus the local communication time schedules of all nodes are brought into a global compliance. This means that the schedule setup is completed and the communication controllers are synchronized concerning their media access behavior.
Hence, even though the media access pattern during normal operation of the communication system is of cyclic, repetitive structure, while in a startup phase the local communication subsystem has to support the initial establishment of a global, for example a TDMA (Time Divisional Multiple Access)-based, access schedule valid for all nodes of the distributed communication system. The communication startup in the distributed system with more than one node being allowed to perform the initial schedule synchronization requires that a node might have to adapt its own access schedule to another one of the nodes, even though it already attempted to startup the communication actively. This is affected for example by a schedule-reset (SR), initiated by the node having to adapt its access schedule to another node. This situation during startup of the communication may lead to allowable deviations (for example schedule-reset, RS) from the usual communication media access schedule.
In order for the communication controller to perform tasks such as sending trigger signals (ARM) etc., it must be functioning. In other words, the communication controller is already started up when the communication across the communication system is started up. The same applies for the bus guardian, which is already started (powered up and executing internal logic), too. Both devices just are not yet cooperating to communicate on the channel. What the present invention addresses is how the communication controller and the bus guardian interact to start communication.
For a bus guardian that is to ensure error containment in the time domain for the respective node, this scenario embodies a fundamental difficulty—to distinguish between allowed changes in the communication media access schedule of the communication controller caused by the startup behavior in conformity with the specification and forbidden changes caused by a failure of the communication controller, which then has to be recognized as such.
A possibility to distinguish between an allowed and a forbidden deviation from the communication media access schedule during startup is an explicit signaling of the communication controller's current status, that is for example a startup mode or a normal mode. The problem with the explicit signaling is that the communication controller can indicate a schedule-reset to the bus guardian whenever it wants to. This allows error scenarios with a communication controller continuously blocking the communication media by periodically signaling schedule-resets to the bus guardian. However, this is exactly what should be prevented by the bus guardian.
For not jeopardizing the independence of the bus guardian there are various approaches that do not utilize the direct interface between communication controller and bus guardian. In such cases the communication controller signals a schedule-reset to a higher-level processing unit, which, for example, is a host controller. The processing unit usually is part of the node and assures the actual functionality of the distributed control system comprising the communication capability and the vehicle control functionality. An indicated schedule-reset can be validated by the higher-level processing unit and can be forwarded to the bus guardian across an independent interface.
The problem is that with the higher-level processing unit there is an entity within the critical path, which has nothing to do with the execution of procedures of the node. A functionality of the node is outsourced to a higher-level entity, which only wants to utilize the data transmission capabilities of the node for realizing the functionality of the distributed control system. This means that appropriate demands concerning processing speed and reaction time have to be achieved by the processing unit, too, which usually is not the case. In the application software of the processing unit special routines have to be provided which offer a guaranteed latency for detecting, processing and forwarding an event, that is the schedule-reset. Thus, a node of the communication system cannot be considered a closed entity and therefore is difficult to certify and to check for conformity.
Hence, the signaling of the current status or an explicit reset command from the communication controller to the bus guardian would violate the requirement for an independent operation of the communication controller and the bus guardian. An independent operation of these two entities is important in order to reduce their sensitivity against so called common-mode errors.
The realization of a separate receive circuit, which allows receiving, decoding and interpreting of the messages transmitted by the communication controller would be much too complex and too expensive.
The bus guardian can assure an independent surveillance of the communication controller it is assigned to only if the communication controller sticks to a definite access scheme. A problem arises if the access behavior of the communication controller to be monitored by the bus guardian changes in compliance with the access scheme and if such an allowable change has to be distinguished from a faulty change caused by a defective communication controller.
Therefore, it is an object of the present invention to provide a mechanism that allows the bus guardian to monitor the communication media access scheme of the communication controller even during startup of the communication.