There are some protocols used in distributed communication systems. In the automotive area especially time-triggered protocols are used. One of such protocols is the FlexRay protocol. The FlexRay communication protocol is the answer of the automotive industry to the increasing demands for reliable, high-speed data communication in the automotive area, or similar applications. The FlexRay communication protocol is based on a TDMA scheme to coordinate access of participating devices to the communication system. But it avoids to employ a master synchronizing node within the automotive communication system to achieve a decentralized more fault robust bus architecture. This requires a mutual synchronization of all participating nodes within the communication system when starting the communication system to achieve an agreement on a global time base.
The FlexRay communication protocol provides a mechanism for such start-up phase by using start-up and sync frames. Further, the FlexRay protocol allows to transmit symbols for avoiding collisions. Normally, data are transmitted by use of frames which are aligned within slots, wherein each frame includes a header and data part.
During operating such communication system based on a FlexRay communication protocol it has been recognized that during the start phase especially failures of a single node may appear, wherein such failure could either prevent start-up of the node, which is lowering the availability or will lead to a clique formation influencing the whole communication system causing a logical network partitioning. The FlexRay communication protocol is a so-called two-channel transmission system. When outputting different or differently timed synchronization frames on both output channels of a node, it may possible to establish a group of nodes, which are synchronized to each other, but not to other groups of nodes. Thus, another group within the communication system may use a different time base, since it is based on the timely displaced sync frames. Such formation of cliques within the communication system may result to either reduce the availability or reliability of the communication system, if it is undetected.
Other failures during synchronization during the start-up phase or also general failures during the transmitting of data could be caused by transient or permanent hardware faults like stuck bits, or flipped bits, timing errors or spurious resets.
Within the FlexRay protocol, each node has assigned certain slots for transmitting its data. During that time, no further nodes shall communicate. Therefore, it is highly required that all nodes accept the slot structure and the general scheduling plan of the communication system, which is based on the global time base, defined during the start-up of the communication system.
Currently there are two solutions for preventing failures of a single node, which may result in a failure of the complete communication system. There are several architectures using a so-called bus guardian, which is added in parallel to each communication controller of a node and which observes the access of the communication controller to the medium and prevents a node from accessing the medium, if it is not allowed for the specific node to access the medium since another node is allowed to transmit during a certain time slot. Such bus guardian has to form its own opinion on the state of its node and on the state the medium should have. Therefore, the bus guardian has basically the same complexity as the communication controller of a node. The bus guardian receives the same commands from the host as the communication controller. Thus, it may not detect faults of the host. Further, to detect whether a node transmits in an illegal slot the bus guardian only roughly checks the timing of frames on the transmitting path and not their content. Thus errors such as small timing differences or wrong frame contents due to a broken counter in the communication controller are not directly detectable by the bus guardian.
A second mechanism is possible which is executed solely by a host (CPU) of the node that may check the data passed from the communication controller to the host. The host is monitoring inconsistencies indicating a possible failure of the controller.
However, both solutions have shortcomings. As already indicated the bus guardian doubles the complexity of the controller. However, it protects the network against nearly all possible failures in various states, not specifically only during start-up. The second solution in which the host detects failures based on information provided by the communication controller, suffers from the fact that it has to rely on the information provided by the communication controller. Thus, the second solution may suffice for many simple errors, but more complex errors of a communication controller could fake this information or simply repeat the correct information from the previous time. Thus, the host may decide based on corrupt information, which is resulting in a non-correctly operating of the communication controller, which may result in a failure of the complete network.
Therefore, a solution is needed which may provide an increased error detection and which is avoiding the possibility to base the error detection and mitigation on only information provided by the communication controller and having a complexity significantly below the bus guardian.