Computer failures can generally be traced to either hardware or software problems. This Background section discusses those failures that can be identified through a careful examination of the data stored by the computer at the time of failure.
When an operating system or application program crashes it is useful, if possible, to save information about the reasons for the crash and the state of the system when it crashed for later reference. Conventional techniques for collecting such POSTMORTEM INFORMATION (or DUMP INFORMATION) require an enormous amount of data be stored. For example, when an operating system (OS) crashes, the common technique of collecting postmortem information is to save the entire contents of the computer's RAM to permanent storage (e.g., disk, tape, floppy, etc.).
As computer's memory sizes and the amount of data associated with the OS continues to increase, the time it takes to store this postmortem information upon failure is correspondingly increased. Indeed, many users simply don't have time to generate postmortem information, and instead opt to manually reboot the computer, losing all postmortem debugging data. Consequently, problems may or may not be reported, and those that are reported would lack the critical data necessary to debug the problem. This means that a random, yet commonly occurring problem may not get the attention required. For example, consider the effort required to generate a complete postmortem debug information file for a large file server with 64 gigabytes of system memory. It would likely take the system many hours to store all of this postmortem information, if enough disk space could even be found to store it on. And if data was finally stored to memory, it would very difficult to move later. Even with very fast networks, it would require hours to copy a 64 GB file over a network connection. Even a conventional personal computer (PC) having only 64 megabytes of system memory could still take an inordinate amount of time to complete a full postmortem dump.
To avoid such delays, some operating systems are configured to output only that portion of the system memory that is allocated for use by the operating system kernel. While this tends to reduce the amount of data generated, even the resulting postmortem dump file is quite large. For example, when configured to save only the kernel portion of the system memory, the postmortem files for the Microsoft Windows 2000 kernel range in size from 32 megabytes to 8 gigabytes in size. For reference, a 32-megabyte file would take about 3 hours to transfer over a 28.8K Baud modem connection.
This same problem occurs with non-operating system programs, also called user-mode programs. As with OS components, the main problem with user-mode post-mortem debug data is that it is typically quite large and it takes a long time to generate. User-mode dump files for Windows 2000 are typically 50 to 100 megabytes in size. As we discussed above, with files this large is it very difficult to transmit files of this size back to the computer or operating system vendor for analysis.
Consequently, the above stated problems and conventional solutions hamper the desire of many users and manufacturers for improved online support of the OS and applications. Here, for example, it would be unacceptable and potentially expensive for a user having a 28.8K Baud modem to transmit a 64 MB memory dump file to the manufacturer for postmortem analysis (it would take more than 5 hours).
As such, there is a need for improved methods and arrangements that substantially reduce the requisite amount of data required to conduct a significant postmortem analysis following an operating system or application failure. Preferably, the methods and arrangements will be advantageously configured to allow for online user support for a variety of users, computing devices, operating systems, applications, and the like.