1. Field of Invention
This invention relates in general to fault tolerant memory systems and, in particular, to an improved method for realigning memory chips within respective memory columns to prevent two chips that were initially the source of a multi-bit error at an address from being aligned at some future time.
2. Cross-Referenced Applications
Application Ser. No. 388,834, filed concurrently herewith and assigned to the assignee of the present invention, describes a fault tolerant memory system in which fault alignment exclusion is provided by (1) data steering logic connected between the memory and the multi-word buffer and/or (2) memory address permutation logic to effectively rearrange the chip addresses in the chip column associated with the defective bit position.
Application Ser. No. 388,830, filed concurrently herewith and assigned to the assignee of the present invention, is directed to a method for storing data in a fault tolerant memory system in which the data portion of the word is stored in either the true or complement form while the check byte portion is always stored in true form to reduce the number of words read from memory that contain more errors than can be corrected by the ECC system.
Application Ser. No. 388,831, filed concurrently herewith and assigned to the assignee of the present invention, is directed to a fault tolerant memory system of the type described in application Ser. No. 388,834 which includes means for transferring the data from each memory chip associated with the memory column which has been detected as containing a defective bit position to other chips in the same column. The data transfer involves addressing the defect column with the old permute address and reading data to a buffer and writing data from the buffer back to the chips in the same column using a new permute address.
Application Ser. No. 388,832, filed concurrently herewith and assigned to the assignee of the present invention, is directed to an arrangement for maintaining an up-to-date map of defective bit positions in the memory during actual use of the memory in its working environment.
3. Description of Prior Art
The desirability of large, fast, inexpensive semiconductor memories is well recognized in the data processing art. Large memories, such as 16 megabyte memories, for example, are generally made up of a number of 64K bit array chips. In one typical arrangement of a 16 megabyte memory, the 64K bit chips are arranged with 128 chips disposed in four 32-chip arrays on one card, with 18 such cards making up the total system. The system is arranged to provide one bit from each 32-chip array in parallel to form a 72-bit data word which includes an 8-bit ECC check character designed to automatically correct a single bit error in any bit position of the 72-bit word by conventional ECC syndrome processing techniques.
A 16 word 72-bit buffer is connected between the central processing unit of the data processing system and the memory system. A store operation involves first, loading the 16 word buffers from the CPU and then transferring the 16 words in parallel to memory in response to a store or write memory command. The memory address involves selecting 16 chips in each 32 chip array and uses 16 bits of the address to select one of 64K storage positions on each of the 16 selected chips.
As is well known, a 64K memory chip does not necessarily have all 64K 1-bit storage positions operative. Since the memory system can tolerate an error in each 72-bit data word that is transferred from memory, considerable cost savings can be achieved by using memory chips which are not necessarily perfect. It is very likely, however, that in the assembly of the various chips into 72 multi-chip arrays on 18 separate cards to form the 16 megabyte memory, the placement of chips with defective storage locations will result in some of the (32.times.64K) word addresses containing more than one defective bit position. Since, in addition to single bit failures on a chip, complete row and column failures are also possible, resulting in 256 bit storage positions being defective, the chances are that occasionally a memory address will contain more than one defective bit position.
When such a situation occurs, the prior art systems suggest various arrangements to avoid the problem.
One suggestion in the prior art involves merely skipping memory locations which have more than one defective bit location. Another arrangement is disclosed in the cross-referenced copending application Ser. No. 388,834. In that cross-referenced application, data steering logic is provided in each array channel between the multi-chip array and the multi-word buffer register. The logic is responsive to the contents of a failure alignment exclusion register to effect scattering of defective bit positions among different data words or memory positions to minimize the occurrence of more than one defective bit position at any one memory address. To further improve the scattering of defective bit positions in the disclosed system, there is also provided an address permutation logic block for each of the 32 chip arrays which functions to substitute one chip (having a defective bit position) with another associated chip in the same 32-chip array in response to control signals supplied thereto from the associated data processing system.
As discussed in the cross-referenced application, the control signals are developed prior to storage of data in the memory system by a suitable test program which (1) identifies all defective locations in the 16 megabyte memory, and (2) identifies all memory address positions in the memory in which the number of defective bit locations exceeds the corresponding capability of the system error correcting system, i.e., two errors. The control signals are then developed for the steering logic and the address permutation logic in accordance with a suitable algorithm that effectively realigns one of the two defective bit positions to another address where no defective bit positions occur. The complexity of the algorithm will, of course, vary depending on the size of the memory, the width of the data word being transferred between the system and the memory, and the number and types of errors which may occur in each of the 64K memory chips.
In such systems, after the control signals are developed which, in effect, scatter the defective bit positions among the various addresses of the memory so that there is no more than one defective bit position at any one address, it is generally assumed that the realignment of the defective bit positions has been achieved without inadvertently creating more than one defective bit position at a memory address. Unfortunately, most test systems which are designed to detect bit positions which are defective in a memory chip and in a large memory system of many chips cannot, for a number of valid technical reasons, identify each and every single bit error in a large, e.g., 16 megabyte memory. In addition, new permanent errors develop in memory chips after they are put into service and these are not necessarily detected at the time they occur. The task of developing new control signals which eliminate the alignment of two defective bit positions does not guarantee that the new alignment will always result in only one defect per address. It is, therefore, desirable to provide a method which will reduce the possibility of substituting chips having defective bit positions that would result in one memory address having two defective storage locations. Such an arrangement becomes more significant as the memory ages and acquires more defective storage locations since the possibility of successful realignment of defective positions decreases as the number of new defects increases. The system further solves the problem of intermittent type faults which occur in normal operation of the memory under particular pattern dependent circumstances which are not addressed during diagnostic testing, by immediately identifying the particular faulty chips at the time the double error occurs.
Should such a condition occur, it must be assumed that the initial memory test which identifies defective locations in each chip somehow missed one of the locations which now has been identified as being defective. If a new address permute vector is developed in accordance with the basic algorithm to replace the initial vector, the fault alignment algorithm that developed the initial control signals may repeat the error at some future time since the new error is not in the originally developed error map. There is, therefore, a need to prevent the two chips that contributed to the uncorrectable error from ever again being paired at the same address or in the same relationship at some other address. The present invention provides a method to prevent these two chips from being so paired in the future.