1. Field of the Invention
The present invention relates in general to the field of computers and similar technologies, and in particular to software utilized in this field. Still more particularly, it relates to a method, system and computer-usable medium for persisting Lightweight Memory Trace (LMT) data across reboots of a system.
2. Description of the Related Art
Known distributed computer systems typically include multiple computers such as application servers, e-commerce servers, web servers, database servers, and other servers. A web server interfaces to client computers via the Internet to provide various services to client computers. An e-commerce server is a web server that enables advertising, information about products, and the sale of products via the web. Other types of application servers interface to client computers via some type of network, such as the Internet, to make its associated applications available to client computers. Often times, a web server or other type of application server accesses a database server to obtain data such as web pages needed by client computers.
During the course of their operations, these various servers may experience a malfunction that requires rebooting the system to resolve. For example, a server may suddenly experience a thousand or more processes running simultaneously as a result of not being able to reach a target device. Once the system is rebooted, the processes are no longer running and the server is performing normally. However, determining the cause of the problem can often be challenging.
One approach to addressing this issue is analysis of information contained in a system dump, which typically consists of the recorded state of the working memory of a server at the time it functioned. While many server operating systems (OSs) provide a method to perform a full system dump, one may not have been performed prior to rebooting the server. There may be several reasons for this, such as the additional time required, the operator's inexperience in doing so, lack of documented procedures, or the reboot was automatically initiated at an application's request.