1. Field of the Invention
The subject invention pertains to the field of data processing and more particularly to a backed up memory system for use with multi or uniprocessors.
2. Description of the Prior Art
In both large and small data processing systems it has been found desirable to be able to reconstruct data in the event of failure of a memory unit. In a closely coupled architecture computer system where there are a number of processors that have access to a shared memory it is particularly important to be able to reconstruct data stored in memory if a portion of it is lost due to failure of one of the memory units. This shared memory includes several independent memory units each of which contain storage, a storage controller, microprocessor and an interface to the communication medium of the system. Such medium can be either a system of buses interconnected to the processor or serial transmission lines. There are various methods of interconnection that are used in the art such as point to point, multi-drop links, or cross bar switches. Typically, data is transferred between processors and memory units in blocks, typically on the order of 4096 bytes at a time.
A highly reliable system requires that data stored in any of the memory units will be available at all times, even in the event of the failure of an individual memory unit. While the reliability of a memory unit can be improved by duplicating some of its components, in general such measures are more complex and less economical than means that allow the reconstruction of data from alternate sources.
Classically, data availability in a system with shared memory is achieved by storing two copies of the data in two separate units. Such systems use a one to one backup known as duplexing. This is an expensive solution, however, since it requires a complete duplication of all memory units and thus requires double the storage that would otherwise be necessary. In systems that require large amounts of memory, duplexing is costly not only due to costs of memories, but also due to economnic factors like floor space, service etc. It is thus desirable to be able to assure backup of a memory system while not duplicating each unit in the memory system.
In U.S. Pat. No. 4,092,732, to Ouchi, a system for recovering data stored in disk drives is described, in which data records are subdivided into segments that are distributed over n independent disk drives. A checksum segment contains the modulo-2 checksum of all other segments and is stored on an additional disk drive. This patent relates specifically to direct access storage devices (DASD) and requires a particular segmentation and distribution of data over multiple devices. In contrast, the present invention relates to all memory types and does not require special formatting or distribution of data over multiple memory units.
In Riggle et al, U.S. Pat. No. 4,413,339, a Reed Solomon code is implemented for checking data transferred between a single data processing device and a data storage device. This patent does not deal with storing data in multiple storage systems accessible to multiple data processing systems.
In Murphy et al, U.S. Pat. 4,412,280, cross checking data between two data processing systems, using a checksum is described. Murphy et al, however, does not describe storing data in multiple memory units accessible to multiple data processing systems, nor the use of a single backup memory unit to backup data in all memory units.
The basic concept of the checksum method to recover from single data failures is well known in the art and is described in Siewiorek et al "The Theory and Practice of Reliable System Design", Digital Press, 1982 and Ohrnstein et al "Pluribus-A Reliable Multi Processor", AFIPS Conference Proceedings, Volume 44, Pages 551-559, AFIPS Press, Montvale, N.J., 1975, but backing up multiple memory units using a single backup memory unit and the associated problems and solutions of the present invention are not disclosed.
It is thus an object of the present invention to provide a highly reliable and available memory system.
It is a further object of the invention to provide a highly available and reliable memory system having only minimal increase in cost and other overhead to the system.
It is still a further object of the invention to provide a highly reliable and available memory system for use in a multi processing system that requires the addition of only one memory unit, assuming no two memory units or processors fail concurrently.
These and other objects, advantages and features of the invention will be more apparent upon reference to the following specification and the annexed drawings.