The invention relates to an apparatus and method for memory modification tracking.
The invention finds particular, but not exclusive, application to fault tolerant computer systems such as lockstep fault tolerant computers which use multiple subsystems that run identically.
In such lockstep fault tolerant computer systems, the outputs of the subsystems are compared within the computer and, if the outputs differ, some exceptional repair action is taken.
U.S. Pat. No. 5,953,742 describes a fault tolerant computer system that includes a plurality of synchronous processing sets operating in lockstep. Each processing set comprises one or more processors and memory. The computer system includes a fault detector for detecting a fault event and for generating a fault signal. When a lockstep fault occurs, state is captured, diagnosis is carried out and the faulty processing set is identified and taken offline. When the processing set is replaced a Processor Re-Integration Process (PRI) is performed, the main component of which is copying the memory from the working processing set to the replacement for the faulty one. A special memory unit is provided that is used to indicate the pages of memory in the processing sets that have been written to (i.e. dirtied) and is known as a xe2x80x98dirty memoryxe2x80x99, or xe2x80x98dirty RAMxe2x80x99. (Although the term xe2x80x9cdirty RAMxe2x80x9d is used in this document, and such a memory is typically implemented using Random Access Memory (RAM), it should be noted that any other type of writable storage technology could be used.) Software accesses the dirty RAM to check which pages are dirty, and can write to it directly to change the status of a page to dirty or clean. Hardware automatically changes to xe2x80x98dirtyxe2x80x99 the state of the record for any page of main memory that is written to. The PRI process consists of two parts: a stealthy part and a final part. During Stealthy PRI the working processing set is still running the operating system, the whole of memory is copied once and whilst this is going on, the dirty RAM is used to record which pages are written to (dirtied). Subsequent iterations only copy those pages that have been dirtied during the previous pass.
International patent application WO 99/66402 relates to a bridge for a fault tolerant computer system that includes multiple processing sets. The bridge monitors the operation of the processing sets and is responsive to a loss of lockstep between the processing sets to enter an error mode. It is operable, following a lockstep error, to attempt reintegration of the memory of the processing sets with the aim of restarting a lockstep operating mode. As part of the mechanism for attempting reintegration, the bridge includes a dirty RAM for identifying memory pages that are dirty and need to be copied in order to reestablish a common state for the memories of the processing sets.
In the previously proposed systems, the dirty RAM comprises a bit map having a dirty bit for each block, or page, of memory. However, with a trend to increasing size of main memory and a desire to track dirtied areas of memory to a finer granularity (e.g. 1 KB) to minimise the amount of memory that needs to be copied, the size of the dirty RAM needed to track memory modifications is increasing. There is a continuing trend to increase memory size. For example main memories in the processing sets of a systems of the type described above have typically been of the order of 8 GB, but are tending to increase to 32 GB or more, for example to 128 GB and beyond. At the same time, as mentioned above, there is a desire to reduce the granularity of dirtied regions to less than the typical 8 KB page size (e.g., to 1 KB). This is to minimise the copy bandwidth required to integrate a new processing set.
With the increasing size of main memory and/or the reduced page sizes, the number of bits, and consequently the size of the dirty RAM that is needed to track memory changes can become large. As a result of this, the time needed to search the dirty RAM to identify pages that may have been modified and will need to be re-copied, can increase to a point that it impacts on the time taken to re-integrate the main memory in the processing sets. Another problem that can occur is increased risk of errors in the dirty RAM.
Accordingly, an aim of the present invention is to provide a more efficient approach to memory modification tracking.
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims.
In one aspect, the invention provides a hierarchically configured dirty memory for a computer system. The hierarchically configuration of a dirty memory greatly enhances the access thereto for identifying parts of the main memory that have been dirtied. Particularly where main memory is very large, and/or the granulation (e.g. page size) employed for the main memory is small, the number of entries in the dirty memory can also be large. As the purpose of a dirty memory is typically to identify the specific parts of main memory that may need to be copied to provide memory reinstatement following a failure (to avoid having to copy the whole of memory and thereby to speed the recovery process or to facilitate an iterative stealthy copy process that proceeds while the operating system and applications continue to dirty memory) having to search all of a large dirty memory can negate, or at least significantly impact, the advantages of having a dirty memory.
An embodiment of the invention takes account of the fact that the dirty memory is typically sparsely populated (i.e., only a relatively small proportion of main memory will be dirty. Accordingly, the use of a hierarchical structure enables an efficient access to the populated parts of the dirty memory, without needing to access unpopulated parts thereof.
A dirty memory according to an embodiment of the invention includes a lower level dirty memory that includes groups of dirty indicators (e.g. dirty bits), each dirty indicator being settable to a given state (e.g. a 1 or a 0) indicative that a block (e.g., a page) of memory associated therewith has been dirtied. It further includes at least one higher level dirty memory that includes dirty group indicators (e.g., dirty group bits) settable to a predetermined state (e.g., a 1 or a 0) indicative that a group of the lower level dirty memory associated therewith has at least one dirty indicator in a state indicative that a block (e.g., a page) of memory associated therewith has been dirtied. With this structure, access to the first level can be used to identify parts of the second level that are populated, avoiding the need to access the whole of the second level.
In an embodiment of the invention, logic is provided to search the higher level dirty memory for a dirty group indicator set to the predetermined state. The logic could be implemented in hardware, firmware or software, or a combination thereof, as appropriate in any particular implementation. Where a dirty group indicator in the higher level dirty memory is set to the predetermined state, the logic can be arranged to search a group of dirty indicators in the lower level dirty memory associated with that dirty group indicator. The logic can be configured to search the group of dirty indicators in the lower level dirty memory for any dirty indicators set to the given state. The blocks of memory to be copied are those associated with any dirty indicators set to the given state. It can be seen that such logic can ensure that a group of dirty indicators in the lower level dirty memory is not searched if it is associated with a dirty group indicator in the higher level dirty memory that is set to a state other than the predetermined state.
The dirty memory could include more than two hierarchical levels. A level higher than a given level that is higher than the lowest level can include dirty group indicators associated with respective groups of dirty group indicators in the given level and being settable to a predetermined state indicative that an associated group in the given level has at least one dirty group indicator set to a predetermined state. The number of levels employed can be chosen to optimise the speed of access and/or to provide a compromise between the speed of access and the complexity of the logic for controlling the hierarchical dirty memory access according to any specific embodiment.
Where reference is made to a predetermined state, this will typically be the same for each of the levels (e.g., a 1 or a 0) to simplify the logic, but alternatively different states may apply in different levels.
To provide for rapid examination of the contents of a level in the dirty memory, a group of indicators may have a length of one word. Similarly, in one example of the invention, a highest level dirty memory has a length of one word.
Another aspect of the invention provides a computer system comprising a dirty memory as defined above with at least one processing set that includes main memory. The computer system may be a fault tolerant computer system and include a plurality of processing sets that each includes main memory. The processing sets can be configured normally to operate in lockstep, wherein the computer system includes logic operable to attempt to reinstate an equivalent memory state in the main memory of each of the processor following a lockstep error.
A further aspect of the invention provides method of managing reinstatement of an equivalent memory state in the main memory of a plurality of processing sets of a fault tolerant computer following a lock step error. The method includes the performance of at least one cycle of copying any block (e.g., page) of memory that has been dirtied from a first processing set to each other processing set. Each cycle includes a step of identifying any block (e.g. page) of memory that has been dirtied from a dirty memory organised hierarchically and a step of copying a block (e.g., page) identified as having been dirtied.
In this method operating system and direct memory access to main memory can be permitted during at least one cycle of copying any block (e.g., page) of memory that has been dirtied from a first processing set to each other processing set. It is advantageous to permit accesses to continue as this means that the system can remain responsive during the reinstatement, although this will cause further memory pages to be dirtied. However, as the cycles of reinstatement get faster and faster (with, hopefully, less pages to be copied on each pass) the number of pages dirtied should, hopefully, reduce on each cycle.
As they may still be some pages still dirtied after a number of passes, a time can be reached where the system is quiesses to prevent any further dirtying to permit a final cycle of copying any page of memory that has been dirtied from a first processing set to each other processing set to be performed.
At this time, direct memory access by I/O devices is inhibited and the operating system is effectively suspended, again to prevent memory being updated further. However, a DMA operation to copy state from one processing set to the other remains operable to copy the remaining dirty pages.