Dump systems allow for the state of a machine (core) to be saved at crash time for later analysis. Traditionally the dump subsystem of a UNIX kernel will write relevant data to a disk-based dump device at crash time.
Technological innovations both in memory configurations and disk input/output interfaces (I/Os) have made tremendous progress the past few years. However, disk I/Os have not been able to keep pace with memory speeds. Hence disk I/Os required during the writing of data to the disk-based dump device when the system panics (crashes) increases system down time in a perceptible manner to the user. This is of most significance when the dumpable memory is large such as in relation to mid and high-end UNIX boxes which typically have terabytes of primary memory.
However, there exist some systems which do not rely on a disk subsystem at the time of a system panic.
Mission Critical Linux's system, “mcore”, does not rely on a disk-based dump device, but instead uses system memory to save the core. On a subsequent reboot of the system the core can be transferred to the file-system. However, actual experiments with “mcore”, and documentation at oss.missioncriticallinux.com/projects/mcore/ readme.php indicate that the system may be in an unstable state following the first reboot. So a second reboot is necessary. This may not be an issue in low-end UNIX boxes where primary memory is less, but would assume significance in high-end boxes as the additional reboot considerably adds to system downtime. In addition, “mcore” is a 32-bit solution and is not available for the mid to high-range UNIX boxes running 64 bit OSes.
Tru64 is a system which is capable of dumping memory pages to main memory when a system panics. However, the dumping method used by Tru64 has a number of disadvantages. Firstly, it does not treat a portion of the RAM in main memory as a system dump device. Thus the Tru64 method only dumps to main memory when there is enough space within the main memory for the entire dump. And secondly, this method is only used for diskless machines—to date it is not used for machines with disks.
It is an object of the present invention to provide a method and apparatus for dumping memory which avoids some of the above disadvantages or at least provides a useful alternative.