1. Field of the Invention
Embodiments of the present invention relate generally to integrated circuit design and more specifically to a method and system for automating unit performance testing in integrated circuit design.
2. Description of the Related Art
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Integrated circuit (IC) design involves the creation of electronic components, such as transistors, resistors, capacitors, and the metallic interconnect of these components onto a piece of semiconductor. Broadly speaking, digital IC design can be divided into three phases: 1) electronic system-level (ESL) design phase, 2) register transfer level (RTL) design phase, and 3) physical design phase. In the ESL phase, the user functional specification is defined and verified. In the RTL design phase, this user functional specification is converted into RTL description, which specifies in detail how each bit of the IC should behave on every clock cycle. Lastly, in the physical design phase, a chip design is generated based on the RTL description and a library of available logic gates. Here, issues such as which logic gates to use, where to place the gates, and how to wire them together are addressed. After resolving these issues, the chip is taped out, and the design data is converted into photomasks.
As ICs become increasingly complex and contain a large number of units, designing and verifying such ICs becomes more and more difficult. FIG. 1A is a conventional approach of verifying both the functional correctness and the performance correctness of a chip 100 prior to entering the physical design phase. Suppose the chip 100 is a graphics processing unit. In step 102, system level stimuli are generated to be fed into the functional model verification environment (FModel), which is designed to generate a set of expected test results according to the user functional specification for the entire chip 100 in step 104. The system level stimuli here generally refer to graphics related commands. In step 106, the full-chip RTL implementation generates a set of actual test results also based on the same system level stimuli. Then, the expected test results are compared to the actual test results in step 108 to determine whether the chip 100 as a whole performs the specified functions as expected. Similarly, the same system level stimuli are fed into the performance simulator in step 110 and also the full-chip RTL implementation of the chip 100 in step 112. The performance simulator estimates a number of clock cycles for the entire chip 100 to respond to or operate on the system level stimuli. This estimated clock cycle count is then compared with the actual clock cycle count that the full-chip RTL implementation takes to determine whether the performance of the full-chip RTL implementation is considered to be correct.
The aforementioned approach has several shortcomings. One, given the complexity of the chip and also the traffic streams that it may receive, developing and deploying the programs and environment to effectively and thoroughly verify the entire chip 100 is difficult and time consuming, because such efforts require the consideration of all possible operating paths in the chip 100. Two, even if the full-chip tests are developed and deployed and even if they successfully detect deviations from the expected results, they still lack any flexibility to efficiently identify the failing units in the chip that cause such deviations. Three, to test on a full-chip level also means that such testing cannot begin until the RTL implementation of the chip 100 is completed. This serial dependency between the RTL implementation of the chip and the testing of the chip often leads to either insufficient amount of testing on the chip or intolerable delays in releasing the chip.
FIG. 1B is yet another conventional approach in verifying the functional correctness of the chip 100 prior to entering the physical design phase but at the unit level of the chip 100. Here, the system level stimuli generated in step 122 still go into the FModel in step 124. However, unlike step 104 in FIG. 1A, the FModel generates interface transactions associated with the stimuli and also the expected functional output for each unit in the chip 100. Each “interface transaction” refers to work for a particular unit in the chip 100 to operate on. Then, in steps 126 and 128, the interface transactions are applied to the RTL implementations of the designated units, such as unit 1 and unit N. The actual functional outputs of the RTL implementations of unit 1 and unit N are compared with the expected functional outputs of the same units from the FModel in step 130 to determine whether functional correctness of each RTL implementation of the unit is achieved. One significant drawback of this conventional approach is the inability to spot performance issues at the unit level. Worse yet, some performance bugs at one unit may hinder efforts to identify other performance bugs associated with other units.
As the foregoing illustrates, what is needed is an improved way of automating unit-level performance testing in IC design and address at least the problems set forth above.