1. Technical Field
The present invention relates to a method and system for copying the data contents of a computer system processor memory during application program run-time without suspending application program access to data stored in the memory during the data-copying process.
2. Description of the Related Art
In data processing environments, computer systems employ processors to execute application programs. In the case of a stand-alone computer such as a personal computer (PC), for example, this may comprise a word-processing application, an accounting application, a desk-top publishing application or a plurality of the same or similar applications running concurrently on the PC. An application program operates on one or more logical objects that comprise logically related blocks of data associated with that program. Such data is stored on a suitable local storage device such as a hard disk, a soft disk or a tape or on a remote storage medium such as a database or a server where the PC is connected to a network.
When an application program performs an operation on a logical object, it reads the related blocks of data for that object from the storage device and writes the data to a memory associated with the processor. The data is temporarily stored in the processor memory while the application program performs operations on the logical object. When the application program has finished performing operations on the logical object, it reads the data (logical object) associated with that application program from the processor memory and writes the (updated) data to the storage device.
Where the computer system comprises a cluster of networked computing nodes (devices), execution of an application program may be performed on a single computing node or distributed between a number of the computing nodes, but the process of executing an application program and writing data to and reading data from a processor memory or memories is logically the same as in the case with running an application program on a stand-alone PC.
Computer systems often include a control application program that is continuously or periodically executed to record control data indicative of system events and/or operating parameters. The control application program is executed in association with other application programs.
In the event of a problem or error occurring in the computer system, it is desirable to be able to obtain a consistent point-in-time image of the computer system's processor memory or memories to allow the problem or error to be determined and rectified.
A known method of obtaining a point-in-time image of control and/or application program run-time data from a processor memory comprises a process known as “quiescing.” This process comprises pausing or stopping all currently running programs in order to prevent their accessing the processor memory and then implementing a dump of the data contents of the processor memory to another memory or storage device. It is necessary to suspend access to the processor memory during the dumping (copying) process in order to prevent the memory contents changing (being modified/updated) during the dumping process which would otherwise corrupt the data and render it useless as a point-in-time image of the processor memory data content.
A disadvantage with the quiescing process is that the time taken to complete the data dumping process constitutes system downtime for the affected application programs and may thus constitute an unacceptable interruption to service. While such an interruption to service is rarely a critical issue when using a stand-alone PC, such interruptions can be critical to the operation of a computer network whose functions may include providing continuous, uninterrupted service.
One solution to the problem of system downtime during a quiescing process is to copy only a portion of the processor memory at any one time. However, this still requires that those application programs accessing that portion of the processor memory are paused or stopped for the duration of the quiescing process which still constitutes system downtime for those applications albeit of a shorter duration than the case where the whole processor memory is being dumped.
An alternative solution to copying the whole of the data contents of a processor memory to obtain a snap-shot (point-in-time) image of the memory is disclosed in U.S. Pat. No. 6,564,219 issued to EMC Corporation. U.S. Pat. No. 6,564,219 proposes a process of obtaining a snap-shot image of the data content of a processor memory by copying only those data elements that have changed since a reference point-in-time where the data content of the memory was known. By copying only those data elements that have changed since the reference point-in-time, this enables the whole of the current data content of the memory to be known. While the process of copying only those data elements that have changed since a reference point-in-time greatly speeds up the quiescing process, it still requires the application programs to be paused or stopped for the duration of the process.
The process disclosed in U.S. Pat. No. 6,564,219 can equally be applied to a selected portion of a processor memory associated with one or more application programs which further speeds up the quiescing process with respect to the one or more application programs, but the disadvantage remains that access to that portion of the processor memory is denied the applications during the data dumping process.