Electronic devices communicate with each other in a variety of ways, often based upon the requirements of a given context. One such context is that of control systems. Unlike simple communication systems where the system merely allows for communication among the devices communicating on the system, control systems communicate for the purpose of explicit control over the modules connected to communicate over the control system. Such systems then allow other applications to run on the various modules. Those applications in a distributed embedded control systems, however, should work in concert.
To provide that group control, most distributed embedded control systems are built around a communication protocol standard, examples of which include CAN (ISO 11898), SERCOS, FlexRay, EtherCAT, and sometimes even Ethernet among others. Higher layer protocols are embedded on top of the communication standard to provide rules for data exchange among participating applications at Electronic Control Units participating in the control network, timing rules, sequence rules, and the like to facilitate communications between the distributed applications that are exchanging information. CANopen, DeviceNet, SDS, J1939, and NMEA 2000 are just a few examples of protocols that are layered on top of the CAN standard. Even meta protocols like CanKingdom are used, by which higher layer protocols can be constructed and optimized for specific distributed embedded control systems.
Each protocol standard has its own strengths and weaknesses. The ideal communication would have an infinite bandwidth, no latency, and full data integrity. Available communication alternatives are fare from the ideal one and compromises have to be found. For instance, Ethernet has a big bandwidth but poor timeliness due to its handling of message collisions. CAN has an efficient collision resolution but low bandwidth and no synchronization support. SERCOS is fast but all nodes have to support the communication requirement of the most demanding node in the system. Accordingly, one big difficulty when designing a distributed embedded control system is to choose the basic communication system to fit the given system's needs. Another complication is that different parts of a system often have different needs. Some parts may involve advanced feedback loops requiring accurate time synchronization and short latencies while other parts may not be time critical at all but instead depend on a correct sequence of events. In another example, a system may during runtime conditions work well with a communication protocol with low bandwidth but would need a high bandwidth for re-flashing modules in a maintenance mode. Moreover, industry requires a number of development and analyzing tools and pool of engineers with an in depth familiarity with the chosen communication protocol to find the correct compromises. To apply the given technologies in a way to take advantage of the good properties of a protocol and to minimize its shortcomings typically requires a long time of practical experience in design and maintenance of distributed embedded control systems based on the chosen protocol and its associated tools.
In the example of CAN systems, the CAN-FD protocol has been developed in an attempt to address the CAN protocol's data bandwidth limitations. This system, however, is not backward compatible with previous CAN-based modules. Accordingly, modules using the CAN-FD protocol cannot be installed into a control network having CAN-based modules and effect communication with those modules. Another shortcoming is that the CANFD protocol is based on the modules' looking for a given set point in time, which requires the modules to have highly accurate clocks and processors. Specifically, CAN-FD requires a switch from a first bit-rate to a second bit-rate relative to an edge in combination with the sample point location. This solution demands stable clocks over the time from the edge to the sample point and a common location of the sample point in the definition of the first bit-rate. To get a precise definition of the sample point limits the possible clock frequency that can be used to run the CAN-FD controller. Moreover, although speed is improved over previous CAN-based systems, the maximum message length is still limited to 64 bytes. Such a system lacks in flexibility for system designers.
Another problem can arise in communication protocols such as CAN in defining an end of the error check portion, i.e., CRC sequence, and guaranteeing that all participating communication devices are using the same number of bits in their respective CRC calculations. More specifically, the mathematical theory begetting a useful CRC part of the message to confirm the quality of the data portion and uncertainty in the start and the length of the data portion of a message necessitate starting reading the CRC exactly at the correct bit position. More specifically, classic CAN uses a 15-bit CRC value that protects any CAN frame having a Hamming Distance of six, which confirms that any frame with less than six bit-flip errors in the message would be detected by the CRC value. The theory of CRC protection demands that both the sender and the receiver use exactly the same number of bits when the CRC value is calculated. This can present a problem in CAN where the number of bits between the Start of Frame (SOF) and the CRC-sequence varies with the number of stuff-bits needed and the number of data bytes given by the 4-bit Date Length Code (DLC).
There are several solutions to improve CAN against this uncertainty in number of bits between the SOF and the CRC. A powerful solution is to have a fixed stuffing. Knowing the location of the stuff-bits in advance makes it possible to know exactly how many and where the stuff-bits are located. There is still the risk of getting out of synch, which results in reading the CRC-field one or more bits too early or too late, thereby corrupting the bit error detection process. Also a bit-error in the DLC will get you of synch in multiples of eight bits.
Nine possible problems could result in a bad comparison between the internal CRC against the received CRC-field produced by the sender in CAN-type communications.
1) Missing the Start of Frame, SOF. This will result in a misalignment between the received bits by one or more bits.
2) Misinterpretation of the RTR-bit will cause a misinterpretation of the DLC.
3) Misinterpretation of the IDE-bit will cause the receiver to receive 18-bit too many or too little.
4) Misinterpretation of the FDF-bit will cause some problem because the res-, BRS- and ESI-bit will be treated as DLC bits or vice versa. In other words, the CRC will be about three bits out of synch.
5) Misinterpretation of the res-bit.
6) Misinterpretation of the BRS-bit will cause the receiver to use wrong bit-rate and this will force the CRC to be completely out of synch.
7) Misinterpretation of the DLC-bit will cause the receiver one or more bytes too much or too little.
8) Misinterpretation of the edges in the bits. This will result in a phase error that could cause the receiver start sampling in one bit too early or too late.
9) Misinterpretation of the stuff-bits. This will result in removing too few or too many stuff-bits. This will make the receiver to be out of synch to the start of the CRC-field.
The entire list describes a number of problems that will result in an incorrect alignment between the internal CRC and the CRC received from the CAN-bus. Problem numbers 1, 2, 3, 7, 8, and 9 are common to both Classical CAN and CAN-FD. The added functionality in CAN-FD given by the FDF and res bits (problems 4 and 5) increases the problem by creating more opportunities for a bit flip error to cause a misalignment error. CAN-FD also supports more variants in the DLC, which increases the error caused by problem 7 compared to Classical CAN. CAN-FD also supports more data bytes, which increases the risks with respect to problem number 9 because 12 more stuff-bits could be missed. Due to the increased number of bits and bit-edges, problem number 8 will increase about eight times in CAN-FD as compared to Classical-CAN.
The probability of any of those problems actually occurring may be very low, but it is possible to describe cases where this could occur. We could even say that the probability is very low because CAN working very well in most cases, and if this were a real big problem, it would have been known by the community. It is known, however, that if there is a problem, it will increase by some magnitude when CAN-FD is implemented to the full extent. For instance, the classic CAN context, these errors are of reduced magnitude and frequency for a variety of reasons, including CAN's relatively low bit-rate, less than 1 Mbit/s; the energy in the dominant bit is relatively high, 4 Volt over 60 Ohm, 0.25 W; the recessive level demands a relatively high energy 0.5 Volt over 60 Ohm (4 mW); the sampled bus level is the same DC-level throughout the electrical bus, covering all units; the typical automotive bus-length is less than 40 meters; the cable is normally mounted close to some metallic part reducing the exposure to electrical disturbance; the logic has filters preventing disturbances from causing bad phase shift; and extensive error-handling prevents bad data parts from being useable. The conclusion, however, is that CAN-FD will be more sensitive thereby increasing an expectation of more bits with error. At the same time, the new bits introduced to CAN-FD increases the probability of undetectable errors.
Individual solutions to each of the above problems generally aim to reduce the ability of those nine error causes to cause reading the bits at the wrong position. Another built-in protection for classic CAN lies in its EOF structure. In a CAN data format, the pattern after the CRC-field is very unique (101111111111S). After the sequence of “10” at least followed by 1 (ACK-delimiter)+7 (EOF)=8 recessive bits in addition to a typical three more recessive bits before the next frame is started. This sequence is unique and cannot exist within the CAN-frame because a sequence of six recessive bits will violate the stuff-rule. Accordingly, if the pattern 1011111,111 . . . follows directly after the CRC-field, all the CRC-bits are in correct relation to the internal CRC-value and indirectly the CRC-comparison must have started at the correct position, and all the nine problems that could cause losing bit position do not exist.
The problem with this protection built into Classical CAN is that this pattern is weak. There are several bit-patterns that with one bit-flip will look like a correct EOF. If you have a data pattern with only recessive bits except for a dominant bit in any of the first five bits, an error that flips the dominant bit will turn the data code pattern into an EOF.