Core dump refers to a process of saving the working memory state of a computer program, e.g., system software of a computing device such as an operating system (OS) of a conventional computing device or a hypervisor of a virtualized computing device, at a specific time, generally when the program has terminated abnormally, the abnormal termination commonly being referred to as a “crash.” The working memory state of the computer program at the time of the crash is saved to a special partition on a storage device or, more generally, to any writable persistent storage device that is accessible at the time of the crash. When the computing device is stateless, i.e., no provision of a storage device, the core dump is performed over a network to a network dump server.
In order to perform a network core dump, the network device and the network stack that includes the network device driver must be functioning correctly. If the core dump is triggered by a failure in the network stack, the computing device will be unable to carry out the core dump over the network. In some cases, the network device may be locked up or have become wedged in a particular state, and cannot function properly unless it is reset and its driver reintialized. When that happens, the core dump over the network cannot be performed easily, because it is difficult to reinitialize the network device driver when it has already been loaded into memory. In view of the aforementioned issues, the conventional network core dump process has not been very reliable.
A variation of the above approach is to use a special mini-kernel. During boot, this mini-kernel is loaded into some reserved memory region. When a crash occurs, control is transferred to the mini-kernel, which then resets the network device, initializes the network stack, and performs the core dump over the network. A limitation of this variation is again its reliance on the network device driver. If the network device driver caused the crash, the core dump over the network cannot be carried out.
Another approach for performing network core dump is to save the working memory state of the computer program in a predetermined region of memory and perform the network core dump from the predetermined region of memory after rebooting the computing device. This technique, however, relies on the memory state of this predetermined region of memory persisting across reboots, and many of today's computing devices do not provide such a capability. The use of this technique is further limited by the fact that some computing devices employ hardware memory scrubbers that clean up memory on boot.