1. Field of the Invention
The disclosure relates generally to a verification environment for designs, and, more particularly to methods and systems for evaluating checker quality of a verification environment.
2. Description of the Related Art
Due to the increasing design complexity, verification has become the major bottleneck in chip design industry. Various verification methodologies have been adopted to verify whether the design behaviors are correct. Among all those approaches, dynamic verification (i.e. simulation) is still one of the major forces to drive the verification progress. A typical chip design is usually implemented in hardware-description-language (HDL) such as Verilog or VHDL. At the same time, a reference model, which is constructed in a much higher level of abstraction such as in C or C++, is developed for verification purposes. The idea behind dynamic verification is to apply various test stimuli on both the design and the reference model and compare the simulation results. The verification environment (VE) will assert if there is any difference in the simulation results between the design and the reference model.
There are many possible stimuli and configurations which can be simulated during the verification process. One of the key challenges is to generate as many stimuli to cover as many design corners as possible. To achieve this, current industry utilizes random test generation (RTG) to produce widely distributed test stimuli. Once the tests are generated, the second key challenge is to measure the simulation progress and determine if the design has been well verified. If not, more tests have to be generated and simulated. The process of determining whether a set of tests is enough is called the “closure decision”. Current industry adopts various verification coverage metrics to measure the simulation progress and, based on that, make the closure decision.
Typical coverage metrics are code coverage and functional coverage. The idea is to measure the stimulus “distributions” under different criteria. For example, line-coverage (one type of code coverage) measures which sections of the implementation code are not executed under the applied stimulus. Similarly, the functional coverage measures the occurrence of values inside the implementation. Having a well distributed stimulus could imply that most of the design functions are executed. However, coverage results itself cannot decide if a simulated behavior is correct or not. It usually assumes any abnormal behavior will be captured by the verification environment. However, this assumption is not usually true. The checking mechanism (i.e. checkers) in the verification environment is just like any other source code, which is error-prone. In reality, verification teams might not have a robust checker system, or miss-connect or disable the checkers by accident. In this case, any error propagated to that checker will be missed. Without a dedicated tool to analyze the checker quality, the verification environment could be non-robust in many cases. For example, a verification environment which has a set of stimulus achieving 100% code/functional coverage could not verify anything without a good checker connected to it.
The checker is designed specifically to check the correctness of design behaviors under various stimuli and conditions. There are many mechanisms to implement checkers such as: shell diff, scripting language such as Perl, assertion, waveform diff, signatures, scoreboards, monitors, and others. In reality, each checker only covers certain regions of design functions. Therefore, each checker has different levels of effectiveness for capturing exception behaviors. It is really critical to quantify the checker quality if we would like to know the quality of the whole verification environment. If a checker is not robust, the verification team should fix it to ensure the effectiveness of the verification environment.
As described, the quality of the test stimulus and the checker should both be considered in the verification closure decision. The stimulus quality, which is usually modeled as coverage, has been well addressed. However, the lack of dedicated checker-related metric and tools is the missing piece in current verification environment.
Conventionally, a mutation-testing method injects artificial faults into a given design (HDL) and runs simulation on the fault-injected-design with a given set of test stimulus to see if the injected faults can be detected by the verification environment. If a fault can be propagated to the circuit boundary but can never be detected by the verification environment, this implies that there are missing checkers to capture this particular fault behavior. This kind of fault, which is activated, propagated to the output, but not detected, is classified as a non-detected (ND) fault in the mutation-testing method. Running various kinds of artificial faults and analyzing the reported ND faults could reveal the checker issues.
However, there are several missing pieces in the mutation-based technique if verifiers would like to adopt it to address the checker problems more systematically. First, there is a lack of metrics to perform systematic measurement of checker quality. The ND faults in the mutation-testing method do point out checker weakness. However, it is not intuitive to have a systematic quality measurement based on that. Secondly, the run-time of mutation-testing could be very long. In order to classify a given fault as ND, the mutation-testing has to simulate all tests that activate this fault. This might take a long time as there could be many tests that activate a particular fault. Before any ND fault is reported by the mutation-testing engine, there is no information pertaining to checker weakness. Thirdly, there is no flexible support of historical data merge and aggregation. As the nature of the mutation-testing method is such that the fault is injected inside a design, if the design or the verification environment evolves during the design cycles (ex: new HDL code, new version of design), the historical data, which were generated based on the previous version of design, could be totally invalid. For example, the simulated fault could disappear after the HDL changes. Even the fault itself is not changed; the corresponding behavior could be very different due to the potential functional changes in the surrounding circuit. Taking the historical data into account is one of the key features for any kind of verification metric. This is especially true as the run-time of mutation-testing is really costly and any invalid data would be just a waste of time and resources. Without the capability of aggregating historical data, designers cannot monitor the progress of improvement of their verification environment.
For those reasons, directly utilizing mutation-testing method to identify problematic checkers might not be an optimal approach. The absence of metric and historical data management will not make it a proper verification metric.