Complex electronic systems are typically designed using multiple integrated circuit (IC) packages mounted on a printed circuit board with interconnects between the packages provided by traces on the printed circuit board. This approach to system design is highly versatile, because a designer can develop a wide range of different systems by selecting from a comparatively limited number of standard IC packages having well defined functionality. Isolation and correction of defects is also simplified, because each IC package can be tested individually, and any packages found to be defective can be replaced. In the case of parallel processing systems, this modular design approach is frequently extended by placing each processor element (and its associated interface, memory, and power management devices) on a single printed circuit (PC) board. Several such PC boards are then connected to a communications “backplane” to construct the parallel processing system.
Well known disadvantages of the use of such a modular design approach include the comparatively large physical distances separating each IC of the system (which adversely affects system speed), and increased system noise due, among other things, to Radio Frequency Interference (RFI) and impedance mismatches between “on-chip” circuit traces and the PC board lines between the IC packages. All of these factors negatively impact system performance. Designers have recognized that wafer-scale integration (WSI); that is, the manufacture of entire systems (or major system components) on a single wafer provides an effective means of maximizing system performance by simultaneously addressing both of these limitations.
However, a primary limiting factor in integrated circuit fabrication is that, as the size of an IC increases, so does the probability that manufacturing defects (e.g., a dust particle on the surface of the IC or physical defects in the IC) will damage at least part of the circuitry on the wafer. This problem is typically expressed in terms of “yield”; that is, the number of serviceable IC's that can be manufactured from each wafer. With current manufacturing methods using 30 cm diameter wafers (approx. 700 cm2 surface area), the yield for IC's covering about 5 mm2 is close to 100%. By contrast, for IC's of about 5 cm2, the yield can be up to about 80% depending on process maturity. As the size of an IC increases toward the total wafer area, the expected yield drops to close to zero, thereby effectively prohibiting the development of practical systems covering the entire surface of the wafer.
For the purposes of the present disclosure, a “defect” shall be understood to refer to a physical disturbance or an anomaly that affects a manufactured component. Common sources of defects include dust particles on a wafer surface during manufacture. A “fault” shall be understood to refer to a change or anomaly in the logical behavior of one or more components. In many cases, defects will produce faults, and thus can be inferred from detected anomalous behavior of an IC. However, this is not always the case; some faults are not related to defects, and some defects do not produce faults. An example of the later case is a fault that pinches, but does not sever a wire.
Various techniques are known for producing fault-tolerant IC's. For example, U.S. Pat. No. 4,621,201 to Amdahl et al. teaches a fault-tolerant IC system in which plural copies of each circuit are manufactured on a single wafer. Predetermined circuit element groups of the copies are interconnected, following manufacture, to assemble the working system. According to Amdahl et al., a majority voting scheme is used, which requires at least triple redundancy for each circuit to be protected. Alternatively, fused links can be used to statically remove defective elements from the circuit. This implies having at least a one-for-one redundancy for each circuit to be protected.
Both of these techniques imply a highly inefficient utilization of the surface of the wafer. In particular, the technique of Amdahl et al. results in a utilization efficiency of about 33%, while the use of fused links achieves, at best, a 50% efficiency. Successful implementation of WSI systems requires a fault-tolerant system architecture that remains serviceable in the presence of manufacturing defects, while maximizing utilization efficiency of the wafer surface.
Another difficulty with WSI is the size of the reticle used for generating circuit traces and components on the surface of the wafer. In practice, the largest usable reticle covers an area of about 2 cm×2 cm, so that multiple reticle images must be used to cover the entire wafer. However, the cost of preparing multiple unique reticles for a single wafer is prohibitive. In the case of a parallel processing system, each processor element of the system can be identical, and a reticle image may cover one or more processor elements. Thus, in principle, a single reticle could be used to generate every processor element. However, the connectivity of each processor element to associated communications busses must necessarily be unique. Development of a cost-effective WSI parallel processing system requires a system architecture in which a single reticle can generate multiple cells of the parallel processing system, each cell having a unique connectivity to a communications bus.
As noted above, defects during manufacture cannot be entirely eliminated. It is therefore essential to be able to determine the operability of each cell (and bus) of a system following manufacture. Traditional techniques, such as functional or “edge connector” tests, and in-circuit or “bed-of-nails” testing are impractical for WSI parallel processing systems. Currently, boundary scan testing is one of the best practical ways of detecting faults in highly complex digital integrated circuits. Boundary scan testing uses built-in test logic to verify the operation of internal elements (including the absence of manufacturing defects) of complex systems and integrated circuits. The Joint Test Access Group (JTAG) has developed a boundary scan test implementation that is the only systematic technique that has been standardized for this task, and is sanctioned by the Institute of Electrical and Electronic Engineers (IEEE) under standard 1149.1.
According to the IEEE 1149.1 standard, a standard Test Access Port (TAP) is located “on chip” for testing each IC of the system. Each TAP has a closely similar structure, including Instruction, Bypass, and test data registers, as well as instruction decode logic, output multiplexers and TAP controller logic for controlling operation of the TAP. In a WSI parallel processing system, each cell may include one or more TAPs for testing the cell's processor element and associated circuitry. Every TAP of the system is connected, in series, on a four- or five-wire “scan chain bus” to form one or more continuous “scan chains” spanning the entire system. Each scan chain has an external appearance (that is, from the point of view of external system test logic) of a linear shift register, in which every TAP can be represented by a unique set of addresses. This enables each TAP to be individually accessed to test specific IC's of the system, and test results obtained for analysis. As a result, IEEE 1149.1 provides an efficient system for testing very complex systems, such as WSI parallel processing systems. The individual accessibility provided by IEEE 1149.1 is often used for other tasks such as boot-strapping, debugging and configuring.
However, implementation of a boundary scan system requires that several TAPs be connected, in series, on a common scan chain. In a system architecture that uses a single reticle to image every cell of a parallel processing system, it is desirable to incorporate traces for this scan chain into the common reticle. This means that scan chain bus traces will be identically duplicated in every reticle image. Within a matrix of identical cells on a wafer, this situation is satisfactory, yielding desired scan chain bus lines and connections spanning multiple reticles. However at the edges of the matrix, a different set of connections will normally be desired. In particular, the signal paths will normally need to “wrap” from one row of cells to a next row of cells (rather than continue to a next successive cell along the same row, for example). It would be desirable to use a single reticle to generate both small scale circuit structures (e.g., of each cell) and large scale circuit structures (such as the scan chain) spanning several reticles on a wafer.
Implementation of a scan chain in a WSI system involves generating circuit structures of the scan chain (e.g. TAPs and scan chain busses) on the wafer. As will be appreciated, if a manufacturing defect severs the scan chain, then the entire wafer becomes both untestable and unconfigurable, and must therefore be discarded. A known method of reducing the effect of this vulnerability is to implement parallel redundant scan chains that follow physically diverse paths through the wafer. For each IC under test, a pair of TAPs perform a boundary scan, and the results are combined using a voting scheme. The use of physically diverse paths reduces the probability that a single defect will sever both chains. Furthermore, the voting scheme implemented for each pair of TAPs reduces the probability that defects at different locations on the wafer (and affecting both chains) will render the scan chain as a whole inoperative. However, if a manufacturing defect has the effect of disabling both of the TAPs testing an IC, then both of the redundant scan chains will be severed, and the scan chain as a whole will fail. In a multi-cell parallel processing system, this situation may arise, for example, when the manufacturing defect affects a large area (covering most or all of a cell), or affects control (e.g. for power supply or clock signal) lines to a cell.
Accordingly, a system architecture for a wafer-scale parallel processing system that remains serviceable in the presence of manufacturing defects, remains highly desirable.