1. Field of Invention
The present invention relates generally to repairable processor arrays and more particularly to an automatically repairable chain of processors.
2. Description of Related Art
Multi-processor arrays may possess millions, possibly even billions of transistors. With such huge numbers the likelihood of individual transistors failing may be non-negligible.
Clearly it is not acceptable to replace the entire array of processors, and, in many cases, it is not feasible even to replace an individual processor or element of the system, particularly if the processor that fails is part of an array of many processors implemented on a single substrate. Therefore, a means of detecting failure and taking some corrective action becomes increasingly important.
Multi-processor arrays have been built since the 1970""s, generally with large numbers of very simple processors. Today technology offers many ways to implement many processors on a chip. It is expensive to manually replace single processor chips, particularly chips with hundreds of pins, and particularly ball grid array (BGA) surface-mounted packaging. Thus extreme efforts are made to detect any failure during manufacturing and qualification. For example, xe2x80x98full-scanxe2x80x99 testing procedures build test circuitry into every register, almost the most expensive approach conceivable. Such circuitry allows every register to be tested for xe2x80x98stuckxe2x80x99 faults, those occurring when a normally two-state system insists on remaining stuck at one state. The tests are performed at various stages of manufacture, typically near packaging of the chips. The farther along in manufacturing, the more expensive things become, so every effort is made to delete failures as early as possible. By the time the system is deployed, failures have reached their maximum cost level. Manually repairing or replacing such components is prohibitively expensive.
A method of testing for faults, excising such faults, and re-connecting an otherwise broken array of cells, is described with examples presented in terms of the basic cell architecture supporting cell excision and net healing. A cell replacement mechanism is developed, and the limiting cases of 100% and 0% replacement are considered along with associated costs. Thus the system allows either replacement of bad cells or bypassing of bad cells, with appropriate cost and operational differences. Both level sensitive and edge sensitive excision mechanisms are described and the consequences of each discussed. The invention applies to processor arrays with one cell per physical chip or many cells per chip, and handles uni-directional or bi-directional data flows. Limiting cases of all uni-directional busses and all bi-directional busses are treated. The invention is generally both interface independent and technology independent. The extension to N-dimensional is developed with the case of 2-space diagrammatically presented.
An object of the invention is to repair a chain of processing elements, as shown in FIG. 1, without having to manually reconfigure the chain. This is traded off against the minimal cost of the associated circuitry described herein, and the software recovery procedures necessary to synchronize operation of the healed chain. It may also be traded off against xe2x80x98full scanxe2x80x99 tests.
An apparatus in accordance with the present invention includes (i) a processing, cell having an upstream interface and a downstream interface, where the processing cell performs processing operations of the extended processor element; and (ii) bypass circuitry, for bypassing the processing cell, connected to at least one select line to receive a select signal and connected between the extended upstream interface and said cell upstream interface and between the extended downstream interface and cell downstream interface, where said bypass circuitry is operative to connect the extended upstream interface to the extended downstream interface in response to an active select signal, and to connect said cell upstream interface to the extended upstream interface and said cell downstream interface to the extended downstream interface in response to an inactive select signal.
In a chain of extended processing elements, where each element has a processing unit for carrying out the processing operations of the extended processing element, and bypass circuitry to connect the processing unit to an extended upstream interface and an extended downstream interface of the extended processing element when the bypass circuitry is not activated, and to connect the extended upstream interface to the extended downstream interface when the bypass circuitry is activated, the chain being formed by connecting upstream and downstream interfaces to each other, a method, in accordance with the present invention, includes (i) receiving information indicating that testing is required of the upstream processor element; testing the upstream processor element to determine whether said cell of the upstream element responds correctly; and activating the upstream processor element bypass circuitry to connect the upstream interface to the downstream interface of the upstream processor element if the upstream processor element does not respond correctly.
An advantage of the present invention is that a physical processor chain can be xe2x80x9chealedxe2x80x9d without having to manually excise the failed cell and manually repair the break. This advantage leads to savings in time and cost of manually repairing a broken chain.
Another advantage is that a chain can be reconfigured by excising some elements and restoring other elements as needed for a particular processing task. This advantage leads to a savings in power if unneeded elements are powered down.