The present invention relates to fault tolerant computer systems having redundant memory subsystems. More particularly, the present invention relates to the mirroring of information on one memory subsystem to another memory subsystem.
Many businesses cannot tolerate having their computer systems unavailable (i.e., xe2x80x9cdownxe2x80x9d) for even a small amount of time. Examples of such businesses include call centers, order entry systems, financial transaction tracking systems, telecomm servers, process control servers in critical environments (e.g., chemical plants, foundries, traffic control systems) and other such businesses. Typically, these businesses use computer systems having a large number of hardware components, which only increases the likelihood that one of the components will fail and the computer system will become unavailable. For example, if a 4 giga-byte random access memory (RAM) has a mean time between failure (MTBF) in years, increasing the amount of RAM to 64 gigabytes may decrease the MTBF to just a few months.
In order to accommodate the demands of these businesses, computer manufacturers have designed computer systems that are fault tolerant, also known as continuously available systems. Typically, these fault tolerant systems include fault tolerant hardware having mirrored components that can be individually removed, replaced or re-synchronized without requiring a long down time. One of the components that is frequently mirrored is the memory.
However, there is an inherent problem with mirroring memory when applications are continuously using the memory. The problem is that correctly copying information residing in the memory causes a visible interruption to the applications. For example, some fault tolerant computer system with memory mirroring functionality prevent access to the memory while the information in one memory component is copied to the mirrored memory component. As one can imagine, the down time increases significantly as the amount of memory increases. Another problem is that because the access to the memory is so intertwined with the operating system, the fault tolerant computer systems are typically designed for a specific business application or for a specific hardware configuration. Thus, the cost of purchasing a fault tolerant computer system with the memory mirroring functionality is drastically increased.
Briefly described, the present invention includes a method of mirroring memory that reduces the down time for copying information from one physical memory subsystem to a redundant physical memory subsystem by separating the mirroring process into phases. One phase allows applications to access the memory and another phases restricts applications from accessing the memory. Thus, by striving to maximize the amount of information mirrored during the first phase, the present invention provides a method of mirroring memory having an acceptable down time for many businesses. The first phase copies information from the first physical memory subsystem to the redundant physical memory subsystem. During the first phase, applications are not restricted from accessing the first memory subsystem while the first phase of the memory mirroring operation copies the information. Thus, the first phase is relatively transparent to running applications. Also, the first phase is designed to copy an optimal amount of the information, if possible. The second phase of the memory mirroring operation copies active information to the redundant physical memory subsystem. The active information includes information that was not copied during the first phase and information that changed during the first phase. During the second phase, applications are restricted from accessing the first physical memory subsystem. However, because the second phase typically copies a smaller amount of information than the first phase, the down time associated with the second phase is minimal.