This invention relates to an improved system of identifying memory cards used as components of semiconductor chip memory arrays which make up the computer RAM.
S/390 and IBM are registered trademarks of International Business Machines Corporation, Armonk, N. Y., U.S.A. and Lutus is a registered trademark of its subsidiary Lotus Development Corporation, an independent subsidiary of International Business Machines Corporation, Armonk, N.Y. Other names may be registered trademarks or product names of International Business Machines Corporation or other companies.
In computers, there is a need to be able to fetch a unique identification of each memory card used in the computer RAMs so that when a memory related event occurs on a memory card, a log of the memory related event can be kept in correlation with the specific memory card on which the error related event occurred. Examples of memory related events which need to be logged are the occurrences of correctable errors, uncorrectable errors, sparing events, store key errors and chip error log. Correctable errors are errors which can be corrected by the error correcton code forming part of each stored double word. Uncorrectable errors are errors which are detectable by the error correction code, but which cannot be corrected by the error correction code. A sparing event is a process in which a DRAM found to be defective is logically replaced on a memory card by a spare DRAM mounted on the memory card for that purpose.
Prior to the present invention, the identification of memory cards has been carried out by field engineers entering the card""s serial numbers into the system console. This system of identification has not been completely satisfactory because the plug locations of memory cards are frequently swapped and memory cards are switched between computer systems in the process of identifying and isolating a failing memory card. This swapping process often results in the loss of traceability of the memory card hardware. The loss of traceability of memory card hardware is detrimental to quality control because once a serial number traceability is lost, it is impossible to precisely identify a specific memory card for replacement or recall due to quality problems. The resulting uncertainty regarding memory cards in which errors or other problems occur results in the recall or replacement of more hardware than would otherwise be required and in increased customer outages.
Modern memory cards have mounted thereon EPROMS which are capable of uniquely identifying and remotely identifying memory cards. However, the EPROM data on memory cards is intended to comprise vital physical data. It can be read or fetched through the universal power controller and cannot be read by the central processing unit addressing system which addresses and fetches data from selected locations in the RAM. As a result, the EPROM data is not directly available to be used in logging memory related events that occur on the memory card.
In addition, EPROMs are known to have a high failure rate. The data stored on the EPROM is used to dynamically configure the system storage during the time of the initial program loading. Because of the vulnerability of EPROMs to failure, EPROM data from multiple memory cards in the system are used to dynamically configure the system storage. When an EPROM failure occurs, if all the cards do not contain identical data, initial program loading failure could result. Accordingly, the storing or unique identifications in each EPROM would not be desirable.
In accordance with the invention, each memory card is provided with a unique identification called ECID. The ECID of each memory card is permanently stored on the re-drive chips of the memory card by being fuse blown in the memory chips. The re-drive chips are provided with the capability of fetching the ECID stored on the re-drive chips in response to a fetch command received from the central processing unit. In the preferred embodiment, there are two re-drive chips on each memory card and there is a unique ECID associated with each of the re-drive chips. Each ECID comprises 18 bits identifying the memory chip and six error correction code (ECC) bits by which errors in the ECID can be detected and, in the case of single bit errors, can be corrected. When it is desired to read the memory card identification, the ECID from both re-drive chips is read out and both ECIDs will uniquely identify the re-drive chip. The received bits of each ECID read from re-drive chips are examined by ECC decoding logic or algorithm. If a single bit error is indicated by the ECC decoding logic or algorithm, the single bit in error is corrected and the ECID as corrected is valid. Thus, valid data will be obtained from each ECID if they contain no errors or if they contain only single bit errors. If the EDD decoding logic or algorithm indicate that a multiple bit error exists in the ECID then the ECID is uncorrectable and is not valid. When the ECID is not valid, the ECID is not used as an identification of the chip and only the valid ECID from the other re-drive chip is used. If both chips have valid data, then either one of the ECIDs can be used as the identification of the memory card.