1. Field of the Invention
The present invention relates generally to the field of computer operating systems and more particularly to methods and systems for the remote tracking of reboot status of components in a distributed system.
2. Background and Material Information
Computer operating systems operate generally to control and manage the resources of a computer system. Typically, an operating system begins execution upon a power-on or start-up of the computer system by a sequence of events known as bootstrapping or booting. The operating system is started or “booted” by executing a portion of code commonly referred to as boot code.
Once the operating system is properly booted, the computer system may begin normal operations. During these operations, the computer system may experience an event that causes an interrupt to the system. Various forms of events may cause the system interrupt, such as a shut down event (e.g., the user directing the shut down of the computer) or computer crash event. During a shut down event, the operating system receives a shut down command, which directs the computer system to perform standard shutdown procedures that may result in a reboot of the operating system. A crash event may occur when the operating system does not receive the shutdown command following a shut down event.
To aid in prevention and/or recovery efforts associated with shut down and crash events, conventional computer systems employ reporting tools that may log information associated with the detected shut down or crash events. These reporting tools may provide local “health” reports that indicate, among other things, the type of event detected (e.g., shut down and/or crash event). Although these tools provide information that may aid in the analysis, and perhaps recovery, of such events, their use is restricted to the local system that runs the tool. For example, in a distributed computing environment including a plurality of computer systems, each system may run a reporting tool that provides error and/or fault status information associated with their respective system.
Currently, there are no systems that collect information associated with reboot, shut down, and/or crash events at a centralized remote location. Therefore, managers of such distributed environments may find it difficult to track the status and ramifications of localized shut down or crash events. In some instances, the manager may be required to visit each local computer system to collect a corresponding health report. Accordingly, there is a need for methods and systems that allows computer system health reports to be provided to a centralized location in a distributed computing environment.