1. Field of the Invention
The present invention relates to a memory system with a hot swapping function capable of replacing a defective memory module without the need for stopping the memory system when a memory error occurs in a system such as an information processing device or a like using a memory module as a storage device and to a method for replacing the defective memory module.
The present application claims priority of Japanese Patent Application No. 2005-086814 filed on Mar. 24, 2005, which is hereby incorporated by reference.
2. Description of the Related Art
In information processing devices such as a server or a like, a memory module (or plurality of memory modules) is used as a main storage in many cases. When an error or a failure occurs in a memory module, to replace the memory module, generally, the memory system is stopped for the replacement. In ordinary cases, the memory module cannot be replaced without stopping the memory system. Thus, access to a memory cannot be suspended to continue operations of the system.
To solve this problem, in addition to a method by which a device itself is duplicated, a memory mirroring method is known in which only a memory system is duplicated and memory data is stored in a redundant manner for every memory bus. FIG. 9 is a simplified diagram explaining the memory mirroring method in which a memory controller 1, memory buses 2 and 3, memory modules 4, 5, and 6 and memory modules 7, 8, and 9 are shown. As shown in FIG. 9, the memory controller 1, by making both the memory buses 2 and 3 perform the same operation, can write the same data to a group of the memory modules 4 and 7, of the memory modules 5 and 8, and of the memory modules 6 and 9. Therefore, when a memory module on either one side is to be replaced, by stopping the memory bus on a side where the memory module to be replaced is being connected, the memory module can be replaced. In this state, the system can continue operations using the memory module on one side through the memory bus on the side where its operations are not stopped and, therefore, after the completion of the replacement of the memory module, simply writes the same data stored in the memory module on the side where operations of the memory module were performed to the replaced memory module through the memory bus on the side where the replacement of the memory module was made.
As is apparent from the operating method as described above, in the case of the memory mirroring method, a memory capacity attributable to performance of the device becomes one-half the memory capacity that the device has. Thus, if the memory mirroring method is employed, it is necessary to double the memory capacity. However, under present circumstances, a price of a memory exerts a great influence on a price of an entire system and, therefore, it is expensive to employ the mirroring method easily.
Moreover, a memory system is disclosed in Patent Reference 1 (Japanese Patent Application Laid-open No. 2004-185199) which has a plurality of memory modules, buffer sections of which are connected in series through a bus and also has a hard disc device on which data stored in the memory modules is copied. According to the disclosed memory system, a hot swapping function is realized in a manner in which a memory module can be replaced by getting access to a corresponding address of the hard disc device when a request for access to a memory module to be replaced is made and in which, after completion of the replacement, data of the hard disc is copied into a corresponding address of the replaced memory module. However, in the invention described in the Patent Reference 1, instead of duplicating a memory module, a memory mirroring is performed by providing the hard disc device, which presents a problem in that more time is required for the hard disc to gain access to the memory modules when compared with the case of the memory mirroring method.
Also, a method is preferably thought to be applicable in which operations of the memory system can be continued by having only one piece of a memory module as a spare memory module and by performing switching between access to a defective memory module and access to the spare memory module when an error occurs. However, by this method, though it is made possible to stop the use of the memory module in which an error has occurred, it is impossible to physically replace the defective memory module to replace the defective memory module with a conforming memory. This is because a route of a memory bus is cut by the physical replacement of the defective memory module, which causes operations of an entire system to be stopped.
In this case, a method is thought to be applicable in which one piece of a spare memory module is provided and in which switching between a defective memory and a spare memory is performed by using a switch, however, in a memory bus circuit in which a plurality of memory modules is daisy-chained, if switching among memory modules is performed using a switch without stopping access to memory modules, connections among circuits for switching are made complicated and long and, as a result, an influence on a transmission waveform of a memory bus occurs, thus presenting a problem in terms of stable operations of the memory system.
FIG. 10 is a simplified diagram explaining a memory system having one piece of a spare memory module 15 in which a memory controller 10, a memory bus 11, and memory modules 12, 13, 14, and 15 are shown. In the memory system shown in FIG. 10, when only the memory modules 12, 13, and 14 are used and the memory module 15 is used as a spare in ordinary cases and, if an error occurs in, for example, the memory module 13 and the use of the memory module 13 is to be stopped, data stored in the memory module 13 is transferred to the memory module 15 and the controller 10 is made to recognize that the memory module 15 is a substitute for the memory module 13 and no access to the memory module 13 is made.
However, in this case, even if the memory controller 10 gets no access to the memory module 13, wirings between the memory bus 11 and memory module 13 still remain connected and, therefore, the removal of the memory module 13 exerts an influence on transmission waveforms through the memory bus 11, which makes it impossible to perform stable operations of the memory system. At this time point, even when the memory module 13 is disconnected by a switch or a like, the similar influence on the transmission waveforms is unavoidable.
Moreover, in a memory system of a serial-transmission type being proceeding toward practical use or commercial feasibility, there is a problem that, if power supply is stopped to a memory module by using a switch or a like, data cannot be transmitted to a memory module connected subsequent to the memory module to which the power supply has been stopped.
FIG. 11 is a simplified diagram showing an example of a memory system of a serial-transmission type in which a memory controller 16, a read signal line 17, a write signal line 18, memory modules 19, 20, and 21, and buffers 22, 23, and 24 are shown. The buffers 22, 23, and 24 are mounted on the memory modules 19, 20, and 21 respectively and are used for serial transmission of data.
In the memory system shown in FIG. 11, if operations of, for example, the memory module 20 are stopped or disconnected, the memory controller 16 cannot access the memory module 21. As is apparent from the example, even in the case of the memory system of the serial-transmission type, it is impossible to disconnect a memory module without stopping the memory system so long as the conventional method is used.
Thus, the conventional memory system has a problem that replacement of a defective memory module is possible by using the mirroring method, however, a rise in costs is unavoidable due to system duplication using memory modules or other storage devices or a like.
Moreover, in the conventional memory system having one piece of a spare memory module, though a rise in costs is small, there is a problem in that stable operations cannot be achieved due to fluctuations of transmission waveforms through a memory bus caused by switching of circuits or a like required when a defective memory module is pulled out for removal.
Furthermore, in the conventional memory system of a serial-transmission type, a memory module cannot be disconnected without stopping the memory system.