Semiconductor IC devices are manufactured on wafers or other bulk substrates of semiconductor material. Conventionally, many devices are manufactured on a single wafer and individual devices or groups of devices are singulated from the wafer and packaged. The IC devices are tested at various points during the manufacturing process, i.e., with electrical probes while they are still on the wafer and then after packaging. The terms “IC,” “device” and “IC device” are used interchangeably herein.
During testing, a particular signal or combination of signals is input to the IC device and the output value or values read from the device are compared with expected values from a properly functioning device. Tests may involve a particular signal or combination of signals being delivered repetitively, perhaps under extreme environmental conditions (temperature, voltage, etc.) in order to identify a device which would fail after a shorter than usual period of use (“burn-in” testing). Other tests may involve a number of different signals or signal combinations delivered in sequence. One method for testing a memory device is to deliver the same signal or signal combination to multiple identical subsections of a memory array in the memory device simultaneously and compare the values read from the subsections (“compression testing”). If all of the respective read values match, the test has been passed, while a mismatch between respective values read from any of the subsections indicates a memory device malfunction and failure of the test.
A particular test or test sequence often includes multiple test steps. Moreover, a complete test flow will often require that IC devices move from one piece of test equipment to another. For example, a first piece of test equipment and test fixtures may be utilized for probe testing, another for burn-in testing and yet another for packaged device testing after burn-in.
After a particular test or test sequence has been completed, IC devices that have failed some portion, or all, of a test may be separated from the good devices. However, an IC device that has failed one portion of the test sequence may pass subsequent test sequences. So, if the failing IC device is erroneously placed into the “good” bin and then passes subsequent tests, it may eventually be classified and sold as fully functional. One way to avoid this type of error is to store information regarding the test history of the device on the device itself in nonvolatile memory elements. One example describing storage of test results in nonvolatile memory on a semiconductor device is co-pending U.S. Pat. No. 6,190,972, issued Feb. 20, 2001, the disclosure of which is hereby incorporated herein by reference for all purposes. A method and system of storing device test information on a semiconductor device using on-device logic for determination of test results are disclosed in co-pending U.S. Pat. No. 6,829,737, issued Dec. 7, 2004, the disclosure of which is hereby incorporated herein by reference for all purposes.
Defects in a finished IC chip assembly can prevent it from operating as intended. In spite of painstaking attention to detail, defects may be introduced at various levels of production. For example, manufacturing defects in the IC die may cause a failure. It has been found, however, that some defects manifest themselves immediately, while other defects manifest themselves only after the IC die has been operated for some period of time.
“Burn-in” refers to the process of accelerating failures that occur during the infant mortality phase of component life in order to remove the inherently weaker ICs. Burn-in testing has been regarded as a critical process for assuring product reliability since the semiconductor industry began. There are various types of conventional burn-in testing. During a process known as “static” burn-in (also referred to as “dumb” burn-in testing), temperatures may be increased (or sometimes decreased) while the pins are biased but not toggled. No data is written to the IC, nor is the IC exercised under stress during static burn-in. During “unmonitored dynamic” burn-in testing, temperatures may be increased while the pins on the test IC are toggled. For a memory IC undergoing unmonitored dynamic burn-in testing, data patterns are written to the memory IC, but not read, while being cycled under stress. Hence, with unmonitored dynamic burn-in testing, there is no way of knowing whether the data written is retained by a memory cell.
In recent years, as memory systems have grown in complexity, the need for more reliable components has escalated. More sophisticated methods of screening infant mortalities have been developed. As IC manufacturing practices have become more consistent, it has become clear that burn-in systems that simply provide stress stimuli in the form of high temperature and VCC (power) to the IC under test may be inadequate because such burn-in systems cannot detect and screen infant mortality failure rates measured in small fractions of a percent.
To address these issues, an “intelligent” burn-in (sometimes referred to as “smart” burn-in) testing can be utilized. The term “intelligent burn-in,” as used in this discussion, refers to the ability to combine functional, programmable testing with the traditional burn-in cycling of an IC under test in the same chamber. Advantages to this approach include: (1) the ability to identify when a failure occurs and, thereby, compute infant mortality rates as a function of burn-in time (and as a result, an optimal burn-in time for each product family may be established); (2) the ability to correlate burn-in failure rates with life test data typically obtained by IC manufacturers to determine the field failure rates of their products; and (3) the ability to incorporate into the burn-in process certain tests traditionally performed using automatic test equipment (ATE) systems, thereby reducing costs.
Reliability curves are used to express an instantaneous failure rate f(t) over time t, and often have a “bathtub” shape. The reliability curves for many, if not all, ICs are generally like that shown in FIG. 1. The reliability curve in FIG. 1 may be divided into three regions: (1) an infant mortality region, (2) a random failures region, and (3) a wearout region.
The infant mortality region begins at time t0, which may occur upon completion of the manufacturing process and initial electrical test. Some ICs, of course, fail the initial electrical test. Inherent manufacturing defects are generally expected in a small percentage of ICs, even though the ICs are functional at time to. Because of these inherent manufacturing defects (that may be caused by contamination and/or process variability), these ICs have shorter lifetimes than the remaining population. While ICs with failures occurring in the infant mortality region may constitute a small fraction of the total population, they are the largest contributor to early-life failure rates.
Once ICs subject to infant mortality failure rates have been removed from the IC population, the remaining ICs have a very low and stable field failure rate. The relatively flat, bottom portion of the bathtub curve (FIG. 1), referred to as the random failure region, represents stable field-failure rates which occur after the IC failures due to infant mortalities have been removed and before IC wearout occurs. Eventually, as wearout occurs, the failure rate f(t) of the ICs begins to increase rapidly. However, the average lifetime of an IC is not typically well known, because most laboratory tests simulate only a few years of normal IC operation.
FIG. 2 illustrates reliability curves measured for exemplary IC device lots A, B and C. FIG. 2 only includes the front end of the bathtub curve illustrated in FIG. 1 for each device lot. Lot A is characterized by a high instantaneous failure rate f(t) that does not improve after any length of time. Lot B is characterized by a stable failure rate that has improved after the infant mortality region, but remains above a selected manufacturing process standard (dotted line). If devices must meet the selected manufacturing process standard, then lots A and B must be scrapped because the stable failure rates obtained in the random failures region is too high. Lot C is characterized by a stable failure rate that is within the selected manufacturing process standard and, thus, may be considered a good lot and suitable for sale. Burn-in testing provides data for determining instantaneous failure rate curves. Determining the instantaneous failure rate curves serves at least two useful functions: (1) one can determine when burn-in testing is complete, i.e., how long burn-in testing must be performed to weed out the infant mortality failures, and (2) one can determine if burn-in testing will complete, i.e., occasionally lots like A or B illustrated in FIG. 2 occur and must be scrapped because they never clean up.
There are at least four approaches to ensuring IC reliability: (1) evaluation of data on a per-die basis by burn-in stressing at the individual IC die level, (2) evaluation of data on a per-die basis by burn-in stressing at the packaged component level, (3) evaluation of data on a per-wafer basis by burn-in stressing at the wafer level, and (4) evaluation of data on a per-lot basis by analyzing a sample of wafers per lot. The first of these approaches may be impractical if the IC die has failures and before redundancy has been enabled. The second approach is quite common in the industry, but does not allow for identification of infant mortalities until all of the packaging steps have been completed. The fourth approach is unacceptable from a reliability standpoint since any single wafer within a lot may have unique reliability problems.
Thus, there exists a need in the art for a system, circuit and method for determining wafer level burn-in reliability.