The present invention relates generally to microelectronic hardware product testing. More specifically, the present invention relates to techniques for testing multi-core integrated circuits.
Modern microprocessor systems derive significant performance enhancements from the implementation of multiple, identical cores on a single integrated circuit (IC). Designing such multi-core integrated circuits generally begins with the design of a single core. Then, several identical, or nearly identical, replications of the core may be fabricated on a single die. For example, a multi-core processor chip may contain a core that is replicated multiple times, wherein each replicated core serves as a single processor. Additional circuitry may be added to the multi-core IC to couple the replicated cores together. Resources shared by the replicated cores may also be included in the IC. Such resources may include memory accessible to all cores and/or a controller configured to coordinate operations among the replicated cores. It is noted that a multi-core integrated circuit, as the term is used herein, includes any IC having a circuit design repeated at least twice.
Multi-core design offers numerous advantages compared to other techniques known in the art for improving the performance of ICs. Generally, when circuits are repeated several times using multi-core principles, performance is increased. Specifically, multi-core ICs may exploit CMOS device scaling to increase device density per unit of area. The resulting parallel operation of an increased number of CMOS devices advantageously increases performance. Furthermore, chip shrink factor and clock frequency limitations are becoming increasingly important considerations in IC design. Multi-core design practices are an effective response to these considerations. Other benefits offered by multi-core design include lower design cost and improved power consumption.
Historically, the most common strategy for improving the performance of a microprocessor has been to increase its clock speed. Most microprocessors include an internal clock that pulses at a well-defined frequency. The frequency is generally expressed in megahertz (MHz) or gigahertz (GHz). For example, a microprocessor with a clock speed of 1 GHz pulses 1,000,000,000 (109) times per second. The clock pulses may drive the operation of the microprocessor and may synchronize the various units of logic within the microprocessor with each other. A primitive operation performed by a microprocessor generally requires a specific number of clock cycles to be completed. Thus, the number of operations that can be performed per second increases in direct proportion to the increase in the clock speed.
Unfortunately, in the present state of the art, there are diminishing returns to increasing the frequency of the internal clock. Inherent physical limitations may interfere with increasing the micro-architectural frequency. An example of an inherent physical limitation is non-scaling, or even reverse scaling, of wire delay. In some cases, overcoming such limitations may be possible but may involve significant costs, such as research and development expenses and more expensive materials. In other cases, the limitations may be theoretical constraints that are impossible to overcome within the context of integrated circuit design paradigms known in the art. Therefore, including multiple cores within the same microprocessor may improve performance more effectively and/or at a lower cost per microprocessor than increasing clock speed.
Additionally, using multiple cores may result in lower design costs than would be required by other methods of improving performance. As more transistors are utilized in an integrated circuit, more time is generally required for circuit design, resulting in increased development cost. Multi-core design helps address such costs. In a multi-core design approach, a single core is typically designed and then replicated on a die. Because the core thus designed has fewer transistors than the entire integrated circuit, design costs may be reduced. While there are design costs associated with ensuring correct interaction of the replicated cores, the net design cost is still generally reduced through the application of multi-core design practices.
Moreover, multi-core designs may allow chip manufacturers to fabricate products with improved power consumption characteristics. Performance derived from parallel execution is generally more power efficient than performance derived from increased clock frequency. Thus, multiplying the number of cores in an IC by a given factor may provide similar performance but less power consumption compared to multiplying the clock frequency of the IC by the same factor. Decreasing power consumption is beneficial because in many environments, the cost of the power required to operate a system represents a significant proportion of the system's total operating cost. Furthermore, many systems operate in environments offering a limited power supply. For example, portable computers typically include a battery to allow operation when no electrical outlet is available. If the battery is out of power and no electrical outlets are available, the computer must cease operation.
Whether or not an integrated circuit has multiple cores, an integrated circuit design generally must be tested to ensure computer simulations of circuit designs function as expected in the real world. Thus, during the design of the integrated circuit, testing is performed to ensure that the integrated circuit works as anticipated. Later, once the design has been finalized and the integrated circuit is manufactured, each die produced must generally be tested to ensure that it operates correctly. IC testing is necessary during the manufacturing process because IC fabrication is a complex and precise process susceptible to minute contaminates and variations that can cause the integrated circuit not to function properly.
Testing, during both design and production, may include functional testing. In functional testing, input values are provided to the integrated circuit. The IC then performs one or more operations using the input values. The results are then analyzed to ensure their validity.
Functional testing of integrated circuits may be facilitated by design techniques known in the art as Design For Test (DFT), also known as Design For Testability. DFT techniques may include adding circuitry to an integrated circuit, wherein the primary purpose of the circuitry is to facilitate testing of the IC. DFT techniques often facilitate reading and writing the internal state of the IC more directly than is possible during normal operation. It is noted that such circuitry generally has no harmful effects during functional operation. The circuitry can thus be incorporated into the IC even though an end user will never use its capabilities.
One DFT technique known in the art is scan design. In scan design, a circuit under test (CUT) is initialized with test patterns using inputs on the IC housing the CUT. The CUT then performs one or more operations using the loaded test patterns. These operations are known in the art as “capture cycle(s)”. The contents of each register in the CUT resulting from these operations may be observed directly via outputs on the IC. If there is some divergence between the output and an expected result, a problem in the CUT likely exists. If the output matches the expected result, there is more confidence that the CUT is functioning properly.
In scan design, one or more scan chains are used to input test patterns and to output the results of the capture cycles. A scan chain connects registers within the CUT into one long shift register. Registers within the CUT may include flip-flops, latches and any other technological device capable of storing data. A shift register can be conceptualized as a bucket brigade where at each pulse of a clock, every bucket (test pattern datum) in the bucket brigade is shifted one increment in the same direction. Thus, when a special test signal called “scan enable” is activated, test patterns may be shifted into the registers. Upon each pulse of a clock, a new datum is shifted from an input pin into the first register in the scan chain, and each datum already in the scan chain is shifted forward to the next register. It is noted that the clock driving the scan chain need not be the same as the functional clock used during normal operation of the IC. In fact, the scan chain clock may pulse at a lower frequency than the general IC clock due to considerations such as power dissipation and the quality of the wiring used to assemble the scan chains.
Once all data are loaded into the scan chain, scan enable mode may be deactivated, causing the CUT to resume normal operation. Thus, functional clock signals may be pulsed one or more times to cause the CUT to perform one or more operations. To view the results of these operations (the capture cycles), scan enable mode may be reactivated. Then, upon each pulse of a clock, the last datum in the scan chain is shifted to an output pin, and every other datum shifts forward to the next register. This allows the contents of each shift register to be viewed directly and compared to an expected result.
Scan testing of complex integrated circuits is expensive due to the number of scan steps that must be performed to test an increasing number of registers. When using scan design methods, one clock cycle is required for each register in a scan chain. For example, if a scan chain includes 100,000 latches, then 100,000 clock cycles are required to pass test data onto the scan chain. For reasons previously noted, the clock cycles used by the scan chains may be significantly longer than the functional clock cycles, further increasing the time required. It directly follows that the amount of time required to test an IC is proportional to the number of registers to be tested. This is important because the cost of testing an IC is proportional to the amount of time required for testing. Systems known in the art for testing ICs typically have a high cost; the cost can be divided by the system's anticipated lifespan to determine an estimated cost per unit of time. Additional costs of testing, such as power consumption, may also be approximately proportional to the amount of time required.
It follows that the cost to test multi-core integrated circuits using scan design methods is proportional to the number of replications of the core within the IC. For example, an IC with four instances of a core may cost approximately four times as much to test as an IC with only one instance of the same core. Clearly, this increased cost is disadvantageous. It may become even more disadvantageous in the future, as present trends are for ICs to include an increasing number of replications of a core.
Techniques known in the art for avoiding such increased costs in testing multi-core integrated circuits have significant drawbacks. One such technique is to utilize multiple, shorter scan chains in lieu of one, longer scan chain. A scan chain with 100,000 latches, for example, can be divided into 100 scan chains containing 1,000 latches each. In particular, a separate scan chain may be assembled for each core. The chief advantage of this approach is reducing the time required to perform a scan test by shifting in multiple values at the same time. As a result, the time required for testing is divided by a factor approximately equal to the number of scan chains (assuming the scan chains are of equal length.)
Unfortunately, this approach has several disadvantages. First, the number of scan chains supported by systems for testing is often limited. Increasing the number of supported scan chains increases testing system cost. Second, the number of pins required for input and output is proportional to the number of scan chains. Adding pins to an IC clearly increases manufacturing costs. Even if the increased costs are acceptable, the number of pins available on high capacity ICs is subject to physical and electrical limitations. As a result, this approach can be unfeasible as a long-term solution, since the number of available pins is growing less rapidly than the number of cores.
It is noted that while techniques exist in the art to multiplex input and output pins of a scan chain onto a single pin (which may also be used for other purposes), such techniques are of limited utility in avoiding the IO pin bottleneck. This is due to the inherent limitation that at a specific moment in time, a pin may only be providing data to a single scan chain. For example, an IC with 30 pins cannot simultaneously provide inputs to 32 scan chains, even if the input for each scan chain is multiplexed over pins used during functional operation for another purpose. Furthermore, increased integration reduces the number of design signal pins that can be easily shared for routing scan test data. This is often due to the adoption of new signaling mechanisms, such as differential signaling, which interfere with the sharing of pins between design, functional and test structures.
Another apparent workaround would be to increase the speed at which data may be entered into the scan chains. However, input speeds are limited by power dissipation considerations. Additionally, it may be cost-ineffective to utilize internal wiring of a sufficient quality to allow for rapid data transfer when that wiring will generally never be used again once the IC has been tested. Thus, it may be infeasible to increase the data rate for the scan chains.
Another workaround involves compressing the input data used to initialize the scan chains. One compression technique is to store the test pattern to be scanned into the registers on the IC. This approach is disadvantageous because it clearly increases manufacturing costs. Another compression technique involves compressing the data loaded into the scan chains and expanding it within the IC. This approach is limited in effectiveness because the proportion by which data may be compressed is subject to theoretical limits.
Those skilled in the art may wonder why Array Built-In Self Test (ABIST) cannot be used to test multi-core integrated circuits. ABIST is specialized for testing replicated arrays of circuits used for storage. It is generally unsuitable for testing logic such as that incorporated into microprocessors and other integrated circuits.
In summary, multi-core design of integrated circuits is highly advantageous, but its advantages can be counteracted by limitations in testing methods known in the art.