1. Field of the Invention
The present invention relates generally to improved systems and methods for retaining valuable memory contents of computer systems during a failure. Among other things, this can enable the diagnosing of faults in computer systems that lead to system failures. One example of this is utilizing dynamic random access memory (DRAM) as a retentive device that may be used to record run-time data on a continuous basis, and that may be read and analyzed subsequent to a system failure to assist with the diagnosis of the failure.
2. Relevant Background
Computer systems are widely used in countless applications including personal computers, consumer products, data servers, and the like. Generally, computer systems include at least a processor, memory, and one or more buses that couple the processor to the memory. The memory may include dynamic random access memory (DRAM) that is typically used for the system's main memory, non-volatile memory such as hard disks, read only memory (ROM), and other types of memory. Often, computer systems will include a plurality of I/O devices that include, for example, a keyboard, a mouse, a DVD player, a network interface, or the like.
Computer systems often encounter hardware and software problems that may lead to a system failure or an unintended system state (e.g., a “crash”). As can be appreciated, system crashes are undesirable because the computer system does not perform its intended function. In many cases, much of the memory contents are not degraded as a result of the fault, and it is often desirable to recover the contents of a portion of this memory. An example would be to reduce the incidence of these problems by analyzing and diagnosing the cause of the system failure, so that changes can be made to prevent future system failures. However, this task can be difficult and time consuming.
One method for assisting with the retention of computer system memory during crashes is to save the system memory (e.g., to a hard disk). Special tools and analyzers can then be used to examine the contents of the memory to try and determine the cause of the failure or attempt to recover the contents. However, the above-noted approach of memory retention has certain limitations. For example, a persistent memory device such as a hard disk is needed to save the system memory. This can add significant cost and power requirements to certain computer systems that do not otherwise require a hard disk (i.e., “diskless systems”). Further, this approach requires a device driver that is operable to copy the system memory to the hard disk when a crash occurs. However, under fault conditions, the state of the computer system's operating system may be such that the device driver may not be able to save the system memory, rendering this approach unhelpful. Even when the operation is successful, saving the state to disk may consume multiple minutes, during which the application is unavailable, lowering the overall availability of the system of which the computer is a component.
In diskless computer systems, a special memory component such as a static random access memory (SRAM) or flash memory may be used as the persistent memory device. However, there are several limitations to this approach as well. For example, adding an SRAM or flash memory device increases the cost and complexity of the computer system. Additionally, this approach provides only a fixed memory capacity, which can only be modified by redesigning the computer system. Further, the bandwidth for writing data to these devices is relatively low, which may reduce system performance. Finally, in the case where flash memory is used, there are a limited and finite number of write cycles available, so the flash memory cannot be used as an active device for data logging.
Therefore, there remains a need for systems and methods that facilitate the retention of computer system memory contents during system failures that do not include some or all of the above-noted limitations. Preferably, such systems and methods would provide reliable memory retention capabilities without increasing the cost and complexity of the systems.