Aspects disclosed herein relate to the field of computer processors. More specifically, aspects disclosed herein relate to periodic non-intrusive diagnosis of lockstep systems.
Automated systems for vehicle control are gaining in prevalence. Just for automotive driver assist systems (ADAS), some predictions call for a 24% compound annual growth rate over the next five years. Functional safety is a key requirement for these systems, which may include ADAS), unmanned aerial vehicle (UAV) systems, aeronautics systems, and defense systems. For example, in a car, an emergency braking system and an adaptive cruise control cannot afford to see failures, as a failure may result in unbearable consequences, such as a car accident. Similarly, aeronautic control systems also cannot afford failures.
ISO Standard 26262 requires compliant systems to be designed and configured to avoid unreasonable risks due to hazards caused by malfunctioning behavior of electrical and/or electronic systems. Faults in systems can be random failures due to soft-errors, hardware aging, or circuit failure. In order to be resilient to failures, one approach that has been followed is to have more than one compute engine running in lockstep for redundancy and every activity (e.g., outputs of the compute engines) is compared at memory interfaces, bus interfaces, and/or compute block input/output (I/O) interfaces. If there is a fault in one or more of the compute engines, the fault will be reflected in a comparison mismatch. Systems in which more than one compute engine runs in lockstep for redundancy and every activity is compared at one or more interfaces are referred to herein as lockstep systems.
If a comparison circuit of a control system (e.g., a control system of a vehicle) develops a fault, then faults in the control system might go undetected, possibly resulting in an unreasonable risk. One technique used to avoid this possibility is to periodically halt the computing activity of the control system, save a context for the control system, perform a hardware diagnosis of the comparison circuit and the rest of the control system, and, if the hardware diagnosis doesn't detect any problems, restore the saved context and resume the activity of the control system. This technique imposes a serious limitation to software architecture of control systems and is frequently very difficult to do, as there is typically an idle-time duration constraint on the operations of the control system. That is, periods of time for the control system to be idle have a maximum allowed length, because the vehicle under the control of the control system cannot be uncontrolled for more than a very short period. This is a serious difficulty in designing a system that requires both safe operation and reliability. Furthermore, as systems are developed that have greater complexity, there are increasing risks of systematic and/or random hardware failures.
Therefore, techniques for improving the reliability of control systems using comparator circuits are desirable.