This invention relates to reconfiguration apparatus and more particularly to such apparatus which will enable one to use very large scale integrated circuits in spite of faults which occur during fabrication or operation of the IC or circuit board containing those IC's.
The prior art is aware of the fact that many very large scale integrated (VLSI) circuits contain a number of repeated modules or circuit configurations which may operate in conjunction with one another to produce a desired result. The VLSI chip may be extremely complicated and contain hundreds of thousands of components which operate in conjunction with one another. As such, certain VLSI devices are employed to perform complicated mathematical processing, and hence many of the circuit components are similar in construction and configuration, especially in so called parallel processors, and operate in concert to speed the processing of data. In any event, as the size of the chip increases, the yield decreases. Thus, as the chip gets larger and more complicated, the likelihood that a given VLSI chip will be fully functional decreases. Thus, it is impractical to furnish very complex chips on a mass production basis.
As indicated, an example of the type of circuitry which would be implemented by VLSI techniques is the so-called parallel processor. Such processors operate upon parallel data streams under control of a single instruction, such as vector add or vector multiply. The processor may consist of a rectangular array of like single bit components, or cells, many of which are implemented in a single chip. The cells may cooperate to form words of varying size and can communicate in all four directions as right, left, up and down with their neighbors and can also communicate with external devices, such as memory, for input and output. Hence the processors can be applied to problems requiring matrix arithmetic, as found in image processing, pattern recognition, as well as engineering analysis.
These devices can perform fixed point and floating point arithmetic. The calculating ability of the processor is dependent upon the size of the array, the clock rate, the word size and the fraction of the array which is enabled for a particular operation. For example, a 128 cell.times.128 cell array operating as 2048 8-bit processors simultaneously using a 10 MHz clock is estimated to achieve on the order of 20 billion additions or logical operations per second and on the order of 2.5 billion multiplications per second.
A special case of a parallel processor is the associative processor which generally performs only search operations. Associative processors are sometimes referred to as content addressable memories and are generally well known. See for example, U.S. Pat. No. 4,010,452 entitled ASSOCIATIVE DATA PROCESSING APPARATUS AND METHOD issued Mar. 1, 1977 to J. Cazanove. See also U.S. Pat. No. 4,296,475 entitled WORD RECOGNITION CONTENT ADDRESSABLE MEMORY issued on Oct. 20, 1981 to L. Nederlof et al.
There are many patents as well as technical articles which describe such arrays. See IEEE COMPUTER, June 1985, "Parallel Processor Programs in the Federal Government" (pages 43 to 56). See especially page 52 concerning the MPP.
In any event, as indicated above, there is a substantial problem in integrating such array chips in VLSI techniques due to the fact that as the chips become larger, providing more cells or more complex cells, the chip yield, based on current fabrication techniques, decreases. In addition, the more such chips are connected together, the greater is the likelihood that a chip failure will cause the loss of much or all of the system.
It is, therefore, an object of the present invention to enable one to utilize a plurality of integrated circuits in spite of the fact that these chips contain faults and to improve the reliability of systems containing large numbers of these chips.
According to this invention, the array configuration apparatus to be described enables a certain class of fault tolerant structures to be used after fabrication. There is provided means wherein two major classes of faults may be excluded so that the proper operation of the system may proceed following the occurrence, detection and location of a fault. The first class of faults is defective wiring which may occur between portions of the system. These defects such as open or shorted connections may occur at the time of system manufacture or may occur during the operating life of the system.
These defects commonly occur between distinct mechanical structures such as printed circuit boards or integrated circuit packages but may also occur between logical blocks on a single integrated circuit chip. Such multiple defects may also be corrected by this invention.
The second class of defects is defective logic blocks. When collections of blocks are connected together such as the cells in a parallel processor, defects may occur in one or more of these blocks. In order to localize the effect of the defect, it may be necessary to dynamically avoid the defect in order to restore operation of the system, and thus this restructuring or reconfiguration must be done without affecting the system or programming in general except during a brief repair interval. This invention, as will be explained, is particularly adapted for arrays of single bit processors. In this case, a small percentage, typically 25 percent, of spare parts, may be provided which, as will be explained, dramatically improves system reliability.
A few wires may be considered to be spare parts and the control of these wires may be integrated into the array of processors, in which case the control is fault tolerant if the processors are fault tolerant. Under the present state of the art, there is no fine grain, dynamically controllable repair means in existence. Prior art techniques use means of verifying data transferred such as encoding the data or providing one or more parity bits. An error correction code typically allows a single bit error to be corrected. But, the encoding is efficient only on relatively large words, such as 16-bits, whereas according to the methods and apparatus of this invention, one is concerned with single bits but may also correct multiple errors. An error correction code would presumably instantaneously correct the error. However, there are no means to correct multiple errors as can be corrected by the array reconfiguration apparatus according to this invention.