The present disclosure relates to error detection in hardware designs in general, and to detection of errors caused due to soft errors that were not handled properly, in particular.
Computerized devices are an important part of the modern life. They control almost every aspect of our life—from writing documents to controlling traffic lights. However, computerized devices are bug-prone, and thus require a verification phase in which the bugs should be discovered. The verification phase is considered one of the most difficult tasks in developing a computerized device. Many developers of computerized devices invest a significant portion, such as 70%, of the development cycle to discover erroneous behaviors of the computerized device, also referred to as a target computerized system. The target computerized system may comprise hardware, software, firmware, a combination thereof and the like. In some cases, a target device is defined by a design, such as provided by a hardware descriptive language such as VHDL, SystemC, or the like.
A soft error, or a fault, is a transient bit-flip or similar value modification that occurs spontaneously. In some cases, the soft error may be caused due to particle strike. An error in a design occurs when a fault results in data corruption. Typically for hardware designs, an error may be a situation when a corrupted value appears on the outputs of the design (or on a predefined set of cut-points). A fault does not always become an error; it may vanish through logical masking, electrical masking, fault detection modules, and the like. Whether or not a fault becomes an error may depend on a state of the target computerized system when the fault occurs and on input values in subsequent cycles.
Some hardware designs contain fault detection logic configured to detect, correct and/or recover from a fault. In response to detection of a fault, the design may, in some cases recover, such as for example by re-loading a previously saved clean state and re-computing values.
Soft error verification may be performed to detect scenarios in which faults are not handled as they turn to errors. In some cases, a simulated execution of the design is performed, and a fault is simulated by modifying a value of a variable, such as by flipping a value of a latch (also referred to as injecting a bit-flip to the latch). In case the fault detection logic does not handle the fault, it may result in an error during simulation. A huge number of simulation runs may be required in order to achieve appropriate coverage, and this is rarely accomplished on industrial designs.
soft error verification is specifically critical in computerized devices that are operated in a hazardous environment, such as outer-space. Such devices may be extremely expensive, and an undetected bug in them may be very costly. For example, consider a bug in a satellite which may cause the satellite to crash. Even a relatively simple bug, such as that causes the satellite to not function correctly may be very expensive to fix, as fixing it may require sending people to outer-space.