As the value and use of information continue to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes, thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling system's may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in IHSs allow for IHSs to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, an information handling system may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
In the context of a datacenter or a similar environment, information handling systems, including enterprise class servers, may occasionally and unexpectedly “hang,” or “crash.”As used in this disclosure, “hang” refers to a state, often detected by a management resource function generally referred to as a watchdog timer, in which the system is substantially or entirely non-responsive to keyboard and other conventional inputs. If a watchdog timeout event occurs, the management resource may execute a system reset that overwrites volatile memory.
For purposes of this disclosure, a system reset refers to an operation that re-initializes the core hardware components of the system and boots the system to fully initialize the system and restart the operating system. While a system reset will most likely return the information handling system to a functional operational state, system reset will also most likely result in the permanent loss of any uncommitted data.
As used in this disclosure, “crash” refers specifically to an unanticipated and unintended soft-reset, during which power is present.
Because a system that hangs is substantially inoperative, it is desirable to identify and resolve hanged systems quickly and automatically. However, a hanged system may have a considerable amount of uncommitted data, i.e., data that has not been saved to system memory or to any type of persistent mass storage.