1. Field of the Invention
This invention relates to parallel computers, semiconductor test equipment, and in particular to the engineering of such equipment for use in production of semiconductor devices for purposes of design and quality control.
2. Description of the Related Art
The predominant application of test during chip production is to determine whether a processed die conforms to the chip designer's expectations of its functional and electrical performance. A tester is a piece of capital equipment that allows the chip maker perform this comparison at the "back end" of the semiconductor manufacturing pipeline.
During test, a chip is referred to as "device under test", or DUT. As is well known, such a chip has a plurality of electrical contact pins which serve as paths for input and output signals communicating with associated circuitry for controlling the chip and receiving responses therefrom. A tester studies a DUT by driving waveforms into its inputs while simultaneously observing its outputs during the test process. The requirements of a tester are stringent: A tester must be able to produce and measure waveforms whose transition rates are an order of magnitude greater than the DUT clock rate, and the tester must be able to accurately maintain timing precision that is an order of magnitude finer than the resolution of the DUT specifications.
The results of test are used in a variety of ways. During production, the observations of output waveforms are used to establish that there is an absence of electrical faults within the chip. In failure analysis, the tester may stimulate the chip in an indefinitely repeating pattern while engineers probe internal nodes. An individual input signal transition or output signal measurement during a test is called a "pin event" or simply an "event". For characterization of a recently designed chip, the tester varies the times at which events occur so as to measure setup and hold margins for input signals with respect to input latch-control signals and to measure propagation delays from inputs to outputs. During the chip's design phase, test results validate the principles of operation of key circuits.
The electrical characteristics generated for input waveforms and expected for output waveforms are created from the chip maker's voltage, current, and slew rate specifications. The collection of sub-circuits that produces input waveforms and measures output waveforms is a PE (Pin Electronics circuit) 80. PEs 80 are analog devices whose accuracy, flexibility, and power-bandwidth product are among the tester's key specifications. The internal design of the PEs 80 is outside the scope of this invention.
During a functional test, the sequence of logical states to be produced in an input waveform or expected to be present in an output waveform is created algorithmically from a test program. The heart of a modern tester is a digital system 10 that generates a sequence of events for every DUT pin. A drive event on a pin directs that pin's PE driver to change the input waveform. A strobe event on a pin is a measurement of the pin's logical value as translated from its electrical value by the pin's PE comparator. Each drive event consists of a precisely timed change in the control signals going to the PEs, and each strobe event is a precisely timed measurement of the pin's output level against expectation.
The digital representation of an event specifies a type and a time. The event type is typically a short code word that names one of the possible events. Some example events are "drive input to logic 0", "stop driving input", "check output for logic 1", and "stop checking output". The event time is typically specified as an integer multiple of some pre-determined time interval. Each digitally coded event is translated to an event that occurs close to the specified time. The tester's guaranteed closeness to the specified event time, known as the "edge placement accuracy", is among the tester's key specifications.
It is not widely acknowledged among tester designers that in producing a sequence of events on each DUT pin, the tester digital system 10 performs what parallel computer engineers would recognize as an archetypal scalable data-parallel computation. The portion of a tester's manufacturing cost represented by this digital system 10 has grown over the last 25 years to approximately 60%, and signs are that that portion will continue to increase as provision is made for per-pin APG (Algorithmic Pattern Generation) or DSP (Digital Signal Processing). With the advent in 1982 of IBM's Tester-per-Pin architecture, and its subsequent adoption in some form by nearly every tester manufacturer, the digital systems of testers have become very similar to SIMD (Single Instruction-Stream/Multiple Data-Stream) computers.
This application incorporates by reference the disclosure of my patent called "I-Cached SIMD" (U.S. Pat. No. 5,511,212, issued Apr. 23, 1996, Multi-Clock SIMD Computer and Instruction-Cache-Enhancement Thereof). That invention relates to Single Instruction-stream Multiple Data-stream (SIMD) computer architecture. A SIMD computer typically comprises one or more single-chip Processing Element modules, each having one or more Processing Elements and interfaces to multi-chip subsystems (MCSs). The Processing Elements bear the brunt of a SIMD computation's workload, while MCSs provide coordination among Processing Elements.
This disclosure of STAR-I teaches means that allow the tester digital system 10 to exploit the construction flexibility and programming scalability advantages of SIMD computer architecture. In a further aspect, STAR-I contains a circuit that allows the event-generation circuits to be flexibly allocated to DUT pins, thus reducing the cost of constructing a system capable of achieving required event rates when the event rate requirement varies across pins. In a further aspect, STAR-I applies Multi-Clock SIMD computer architecture to allow the multi-chip and intra-chip circuits within the tester digital system to operate each at its maximum rate as determined by the circuit topology and the signaling characteristics of the VLSI-based technology in which the circuits are realized. STAR-I maximizes the tester digital system's performance-to-hardware-cost ratio by applying what is taught in the I-Cached SIMD patent.
The architect of a tester faces the daunting challenge of creating a system using today's readily available component and assembly technologies that, in terms of logic signal transition rate and logic signal transition accuracy, out-performs devices that are planned to be made using tomorrow's exotic technologies. This difficult requirement has led naturally to the exploitation in tester architecture of the algorithmic parallelism inherent in the test computation. Surprisingly, the fields of tester design and parallel computer engineering have heretofore been distinct and mutually exclusive. This invention arises in part from a novel intersection of these two fields; part of what is claimed applies the optimizations and improvements known for parallel computer engineering to the practice of manufacturing tester digital systems.
The digital system 10 (otherwise known as the high-speed system) is the tester's primary algorithmic component. The digital system 10 comprises an array of timing generators 70 (TGs, sometimes referred to as event generators) that are collectively supervised by a single system controller 20. A group of timing generators 70 connects to each DUT pin's pin electronics (PE) 80 circuit via a pin channel 82. The PE circuit comprises a driver that produces input waveforms and a comparator that measures output waveforms against reference levels. The pin channel 82 fans in the TGs' drive control outputs to the PE's drive control inputs, and the pin channel 82 fans out the PE's comparator outputs to the TGs' measured pin value inputs. Individual events fall into the following four classes of precisely timed actions:
a change of the state of the PE driver driving the pin input (driver on/driver off), PA1 a change of the value driven onto the pin by the PE driver (logic 1/logic 0), PA1 a change of the observation status of the pin output (begin comparing/stop comparing), or PA1 an instantaneous observation of the pin output value (compare logic 1/compare logic 0). PA1 In one aspect, STAR-I contains a TG 170 whose architecture is that of a generally programmable processing element. This enhancement over the conventional tester's somewhat-configurable TG 70 increases flexibility and scalability, for example allowing the TG design cost to be amortized over a larger number of instances sold. PA1 In an independent aspect, STAR-I includes an inter-TG communication subsystem 112. A subsystem whose topology is a major discriminating characteristic among the parallel computers used for more general-purpose applications, an inter-TG communication subsystem 112 is absent from conventional testers because production test as it is commonly realized requires no inter-TG communication. The advantage of including an inter-TG communication network 112 is that it enables the TGs 170 to share intermediate results, such as are generated during the execution of APG or DSP algorithms. An inter-TG communication network 112 is sketched in FIG. 6, FIG. 8, and FIG. 10. PA1 In a further aspect, the STAR-I digital system includes a software-configurable reconfigurable allocator circuit 152, through which a subset of the set of TGs is associated with one member of a subset of the set of DUT pin channels. The association achieved by the reconfigurable allocator circuit 152 is to multiplex the drive events produced by each member of the subset of TGs onto the corresponding drive control input of the pin channel and also to fan out that pin's observed logical value for use by strobe events within each member of the subset of TGs. The appropriate size and hierarchical decomposition of the reconfigurable allocator circuit 152 is determined by the geometries of the elements composing the integration hierarchy used in the physical realization of the tester digital system. The most general reconfigurable allocator circuit 152 is an N.times.M cross-bar through which any of the tester's N TGs 170 is associated with any of the DUT's M pins. PA1 A further aspect of STAR-I is the method for configuring the reconfigurable allocator circuit 152. A simplest way of deciding how to configure a reconfigurable allocator circuit 152 restricts the subsets of TGs connected to each DUT pin to be totally disjoint subsets of the set of TGs 170. In other words, a simplest method of configuring a reconfigurable allocator circuit 152 imposes a many-to-one mapping from TGs to DUT pins. At the maximally complex other end of the spectrum, STAR-I includes a reconfigurable allocator circuit 152 implementing a many-to-many mapping, so that each TG 170 in the tester digital system 110 is associated with some number of DUT pins and each DUT pin is associated with a plurality of TGs 170. This more complex TG-to-DUT-pin allocation method is able to take advantage of scenarios wherein a single TG's outputs may be shared among a plurality of DUT pins. The simpler method allows for a circuit interconnect topology that requires relatively few active elements for its realization. PA1 FIG. 9 shows an example of a reconfigurable allocator circuit 152 implementation that is appropriate for the simple (many-to-one) allocation method and which is less costly than a full cross-bar. In the example sketched in FIG. 9, the reconfigurable allocator circuit 152 is capable of realizing many-to-one associations between 64 TGs 170 and 8 pin channels. PA1 A general mathematical formulation for the topology of this class of reconfigurable allocator circuit 152 interconnection for a set of I TGs 170 numbered from 0 up to I-1 and a set of J pin channels numbered from 0 up to J-1 may be described with the following two principles: PA1 A further enhancement of the reconfigurable allocator circuit 152 is to allow re-configuration during functional test. This enhancement allows a given TG 170 to be connected successively to members of a group of DUT pins. This "run-time TG 170 reallocation" capability caters for the applications such as edge search, wherein a single member of a group of pins receives a relatively large number of events during some interval of the test. By re-configuration of the reconfigurable allocator circuit 152, a relatively small number of timing generators 170 is able to meet the edge rate requirement that, in a fixed allocation of timing generators 170 to DUT pins, would require many more timing generators 170 and thus greater expense. PA1 A further enhancement is to include some subset of the digital system's reconfigurable allocator circuit in the TGBB 150. PA1 In a further aspect, STAR-I incorporates a compilation method for analyzing the event rate requirements of test programs. By determining at the time the test program is created how many TGs 170 need to be associated with each pin to achieve the required event rate for that pin, STAR-I minimizes the total number of TGs 170 included in the tester provided to a customer for specific test purposes, thereby minimizing the cost of the tester. This method restricts the topology by which the collection of DUT pin PEs is associated with tester TGBBs 150, because it would be most cost-effective to evenly distribute the high-event-rate pins across the set of TGBBs 150. PA1 A further aspect of STAR-I is to include a local controller 168 in the TGM 160 that is capable of decoding globally broadcast instructions into a single-clock-cycle control word for the TGs 170 realized within the TGM 160. PA1 Another aspect of STAR-I includes a Local External Memory interface 166 in the TGM 160. PA1 Another aspect of STAR-I includes a generalized response network interface 167 in the TGM. PA1 A further aspect of STAR-I augments the TGM 160 to contain a Multi-Clock Generator 300 as taught in my I-Cached SIMD patent. PA1 A further aspect of STAR-I includes an I-Cache 310 in one of its many forms as taught in my I-Cached SIMD patent.
During a test run, the digital system system controller 20 executes a test program. The system controller 20 broadcasts a sequence of instructions to the array of TGs 70, which in response produce an event sequence for each DUT pin. A TG 70 is primarily a digital circuit that represents event types and application times as digital codes. In response to an instruction broadcast from the system controller 20, the TG 70 digitally calculates an event type, as well as the precise time of the application of that event to the DUT pin. These digital event codes are converted at the TGs' periphery to precisely timed driver control signal transitions (for input events) or to precisely timed pin value measurements (for strobe events). The conversion circuit is commonly called a formatter 74. The formatter 74 performs a digital-to-analog conversion of drive events and an analog-to-digital conversion of strobe events. The formatter's 74 time-domain digital-to-analog converter sub-circuit is called a vernier. The linearity, jitter, and re-trigger interval of the vernier contribute directly to a tester's most important performance characteristics.
A sketch of a conventional tester digital system 10 is shown in FIG. 1. The system controller runs a number of programs including the tester's operating system, test program development environment (compiler and debugger), results analysis tools, and DUT failure analysis tools. The primary function performed by the system controller 20 for the purposes of the digital system is storing and sequencing the test program.
FIG. 2 shows some detail of the system controller 20. Via the operator console, the system controller 20 displays logged data to the operator, allows the operator to vary test parameters (pertaining to electrical, thermal, and timing characteristics of the DUT), and allows the operator to monitor and alter the test flow. The system controller's 20 disk storage device is the ultimate repository of test programs.
The system controller 20 generates a system clock 30 and, on each cycle of that clock 30, an instruction which is distributed through a global instruction broadcast network 40, shown in FIG. 1. The globally broadcast instruction specifies the logical DUT activity for a tester machine instruction cycle in addition to DUT clock phase information. The globally broadcast instruction specifies one of a known set of collections of per-pin event sequences, one event sequence per DUT pin.
The global instruction broadcast network 40 conveys the system clock 30 and instructions to an array of TGs 70. The TGs are realized within Timing Generator Building Blocks 50 (TGBBs). The TGBBs 50 occupy the preponderance of the circuits included in the digital system. The TGs respond to the system controller 20 with FAIL information indicating whether some DUT output pin strobe value differed from expectation on some vector. The TGs also provide RDBACK information allowing the system controller to monitor the state of each system element.
FIG. 3 sketches a conventional tester's TGBB 50. It contains an array of Timing Generator Modules 60 (TGMs), each of which in turn contains an array of TGs 70, associated with local memory 62 for storing TG configuration information including pattern data and local test result information including log data. As shown in FIG. 3, a conventional tester's TGMs 60 are associated on a 1-to-1 basis with DUT pin channels. A conventional tester immutably associates the collection of TGs 70 within a TGM 60 with a uniquely determined DUT pin channel 82.
FIG. 4 sketches a conventional tester's TGM 60 containing K TGs 70 and a K-to-1 TG-to-pin aggregator 64. The TG-to-pin aggregator 64 in a conventional tester combines drive events from the TGs 70 within the TGM 60 to send to the DUT pin channel 82, and it fans out the measured pin value to all TGs within the TGM for reference in strobe events. The TGM 60 also contains a response network interface 65 and a local external memory interface 66. The local controller 68 shown in FIG. 4 serves the modest function of electrically standardizing the clock received from the globally broadcast instruction for re-broadcast within the TGM. The local controller 68 also may provide pipeline stages for the globally broadcast instruction for subsequent re-distribution within the TGM via the local instruction broadcast network 69.
FIG. 5 shows a conventional TG 70. The TG 70 contains a number of storage elements that are read-only during functional test and are used to construct the event sequence. The digital-to-analog-to-digital event converter shown in FIG. 5 represents the TG's formatter circuit 74. The formatter 74 converts digitally coded drive events to PE driver control signal transitions that occur at the desired point in time, and it converts digitally coded strobe events to fail outputs that are achieved by sampling the logical value of the DUT pin (as represented by the PE comparator outputs) at the desired point in time. The conventional TG 70 also contains a fail pipeline through which log data is synchronized with the data logging requirements indicated in ensuing globally broadcast instructions.
When early testers were developed by semiconductor component makers in the 1960s, the common organization included a set of timing resources that were shared among all DUT pin channels in generating events. The system controller's 20 globally broadcast instruction included a collection of timing edges. A single TG 70 was associated with each DUT pin. Programming the tester required selecting, for each DUT pin, which of the timing resources applied to its events. As DUT timing complexity increased, the number of distinct timing resources required grew too large for this shared timing resource organization to remain practical for production of high-performance devices.
In 1982, IBM introduced the "timing-per-pin" organization, wherein the TG 70 associated with each DUT pin channel contained a timing generator circuit. This organization had the flexibility advantage of providing each DUT pin with potentially unique timing characteristics.
In the 1980s and into the 1990s, a number of tester manufacturers expanded on the timing-per-pin organization to include a collection of TGs 70 per pin. The TGs 70 in such systems are not necessarily replications of a single circuit design, but instead may be special-purpose circuits. The TGs 70 do not function independently, but produce events per tester machine instruction cycle as directed by a waveform memory associated with each DUT pin. This architecture is widely used in test equipment sold today.
In 1988, Schlumberger developed the "sequencer-per-pin" organization [West and Napier, "Sequencer Per Pin.TM. Test System. Architecture", International Test Conference Proceedings, pp. 355-361, 1990]. This digital system architecture provided for each DUT, pin channel 82 a fixed-size collection of timing generators 70 and a sequencer for assigning events to timing generators 70 for application to the DUT pin channel 82. This organization had the advantage of a high degree of flexibility in the timing characteristics of the waveform generated independently for each DUT pin.
The present invention is distinguished from these architectures in several aspects: In the present invention, the TG of is a generally programmable circuit that is replicated to provide the required per-pin event generator resource, the TGs 170 are interconnected so as to exchange intermediate data, the TGs 170 are flexibly allocated to DUT pins under software control, multi-clocking 300 allows the local generation of high-rate clocks within the TGMs 160, and SIMD instruction cache 310 eliminates the need for high-speed global instruction broadcast.
In 1989, ASIX proposed a digital system organization wherein the TGs 70 were inter-connected via a linear array network [Lesmeister, "The Linear Array Systolic Tester (LAST)", International Test Conference Proceedings, pp. 543-549, 1989]. This digital system organization allowed the TGs 70 to share access to a common pattern memory, thus decreasing the memory bandwidth requirement, thereby decreasing the cost of the test system.
The inter-TG communication subsystem 112 of the present invention 100 is distinguished from that of the ASIX architecture in two aspects: first, higher-dimensional interconnects (including 2-D and 3-D meshes), as well as bi-directional communication links are claimed; second, the present invention allows for exchange of TG register file data under control of the globally broadcast instruction stream 140, whereas the ASIX design provides only a fixed (hard-wired) flow of information from common pattern memory through the array of TGs.
In 1992, LTX/Trillium proposed a single-chip TG 70 design that contained an on-chip phase-lock loop (PLL)-based clock generator [Alton, "TGEN: Flexible Timing Generator Architecture", International Test Conference Proceedings, pp. 439-443, 1992]. The PLL output oscillated at 4 times the system clock rate but was not used to multiply the event rate above the system clock rate. Rather, the high-rate reference clock was used to simplify the implementation of a sub-clock-interval vernier (edge converter), whose linearity is critical to the testers overall timing accuracy. Linearity tends to decrease as the length of clock interval spanned increases, so the high-rate on-chip clock was used to shorten the interval spanned by the vernier.
Multi-clocking 300 as proposed in the present invention 100 is distinguished from the LTX invention in that in the present invention, the high-rate local clock 302 is used to regulate the digital portion of the TG 170. It is interesting to note that, with multi-clocking 300 as proposed in the present invention, it is yet possible and perhaps desirable to provide the verniers with a globally distributed clock. PLLs for local clock generation are known to exhibit clock jitter; with current technology, that jitter is on the order of 50 pS to 500 pS. While the digital portion of a TG is insensitive to jitter that is less than 10% of the interval of its regulating clock, the vernier circuit's timing accuracy is directly degraded by such jitter. Therefore, whereas 500 pS clock jitter would likely not disrupt a digital circuit operating at 200 MHz, as little as 50 pS jitter on the clock signal regulating the vernier would alone consume all of a modern high-performance tester's timing accuracy budget.
In 1992, Hewlett-Packard developed the "processor-per-pin" organization [Schoettmer and Minami, "Challenging the `High Performance--High Cost` Paradigm in Test", International Test Conference Proceedings, pp. 870-879, 1995]. This digital system architecture provided for each DUT pin channel an independently programmed test processor for generating sequences of logical values and controlling the generation of successive events for application to the DUT pin channel. This organization had the advantage of reducing the amount of information delivered through the global instruction broadcast network, thus reducing its cost.
By moving sequencing from the system controller 20 into the TG 70, the HP invention surmounts the global instruction broadcast bottleneck, which limits the flexibility and scalability of a test system and therefore tends to increase its cost. The HP digital system architecture is a MIMD computer. The present invention has a number of advantages over the HP architecture: the present invention is a SIMD computer and therefore enjoys an inherently lower implementation cost: A SIMD processing element is known to cost as little as 20% as much as its MIMD counterpart. The HP architecture does not provide for global control of the TGs 170 during a test run, whereas an I-Cached SIMD tester digital system 110 provides control over the TGs 170 at a moderate granularity via the global instruction broadcast network 140. The per-pin processors in the HP architecture do not exchange intermediate data, whereas the TGs of the present invention have that capability. Finally, as do the conventional test systems, the HP architecture allocates a fixed processing resource to each pin of the DUT; the present invention, by contrast, allocates a number of TGs to each DUT pin as dictated by the requirements of the test program.
The state of the art of tester digital system architecture has progressed over the last 30 years, through a series of independent innovations, from "shared timing", to "timing-per-pin", to "sequencer-per-pin", to "processor-per-pin". In this light, the present invention may be seen to constitute a "re-configurable array of processing elements per pin" architecture. The various innovations claimed, applied independently or together, provide the flexibility needed to engineer a high-performance tester whose digital system costs less and is smaller than those of conventional testers.