This invention relates to a data storage system which comprises an array of integrated circuits or semiconductor chips, each such chip including a memory consisting of an array of memory locations some of which may be faulty.
Memory chips suffer from a small number of faults which can arise during manufacture or can develop subsequently. Systems have been devised which can tolerate such faulty chips. The commonest form of fault tolerance in use is the incorporation of spare rows and columns of bits into each memory chip. After manufacture the defective row or column is identified by chip testing, and the spare/s programmed to replace the defective element. The programming is permanent, using e.g. laser cutting or electrical fuse blowing. This has proved to be a successful technique and is in use today by all memory manufacturers, see the review paper "Redundancy--the new device technology for circuits of the 80's", R. J. Smith, International Electron Devices Meeting, December 1982.
The so-called redundancy approach does suffer from limitations especially in the early part of a memory product manufacturing life cycle. The manufacturer has to predict the numbers and type of faulty elements, expected as a result of a specific manufacturing process, in order to design the appropriate number of spare rows and columns. The manufacturer's choice of spares is constrained by limits imposed by chip area and performance. Furthermore the programming technique may restrict the efficiency of element replacement. The programming is permanent, hence there is no provision for faults developing after some period of operational use.
An alternative approach is described in GB 2 184 268. Instead of trying to manufacture perfect storage devices by repairing imperfect chips in the factory (redundancy), it was proposed that imperfect devices be salvaged and only their good bits used. The technique suffers from one main drawback. During data transfer, large blank gaps will periodically appear. Whilst for a local computer these gaps could be identified and input or output suspended, a more remote system would have difficulty in stopping and starting. These gaps lead to a greatly reduced data transfer rate. A further disadvantage is a reduction in storage capacity.