This patent application relates in general to the field of computing devices, and more particularly to a method and system for automating support for computers.
Personal computer systems have become increasingly common in businesses and households. Although the term xe2x80x9cpersonal computerxe2x80x9d implies a generic device, personal computers generally have a wide diversity of hardware and software components. For instance, different personal computers may have processors and buses of different speeds, hard drive and RAM memories of different sizes, and peripheral devices interfaced with different types of interface cards, such as audio devices. Further, a large array of manufacturers produce computer components so that in a given personal computer even components having substantially similar operating characteristics may have important differences based on each component""s manufacturer specification.
With respect to software, generally all personal computers have a common need for an operating system that coordinates the operation of hardware components. However, each individual personal computer may have one of many possible operating systems. For instance, Microsoft products have evolved from its original Disk Operating System (xe2x80x9cDOSxe2x80x9d) to Windows systems, including Windows 3.1, Windows 95, Windows 98, Windows CE and Windows NT. In addition to these Microsoft operating systems other types of operating systems are available, such as different versions of Unix, including Linux.
In addition to this wide diversity of operating systems, personal computers may operate a large number of different types of software applications. A given software application may interact in different manners with different operating systems. Thus, even with substantially similar hardware components, personal computers having different software may operate in substantially different manners.
Computer users can experience difficulties in system operation for many reasons. Lack of knowledge, hardware faults, software incompatibilities, and many other causes can lead to problems for the computer user. Given the wide range of hardware and software available (which implies an even greater range of hardware/software combinations that a user can experience), it is difficult to determine if the computer has a problem.
This situation is further complicated by the fact that personal computers do not have good mechanisms to automatically determine if the hardware/software system is having a problem. While certain operating systems contain code that help sense some types of problems with specific pieces of hardware, such mechanisms may be insufficiently uniform for determining if the operating system has a problem. Indeed, a common symptom of an operating system problem is a failure to boot, in which case the OS cannot be counted on to help. Another common symptom of an operating system problem is a hang, in which case the operating system becomes unresponsive to the keyboard and mouse for a wide variety of possible causes. It should be noted that this type of problem can be caused by pieces of software which have been installed on top of the operating system, such as an application or driver, or some incompatibility between pieces of software that have been loaded. A system that was operational may stop functioning at some later point due to software incompatibilities.
Another issue is the lack of a uniform mechanism for the user to invoke assistance. If the user has a question or the system has a problem, or at least the user perceives a problem, there is currently no uniform mechanism to get the system to attempt to provide assistance to the user. Although there are various types of help available to the user, they rely on one or more working input devices, such as a mouse and/or a keyboard, and a sufficient level of user knowledge to be able to navigate to one of a variety of information sources on the system and on a global information source such as the internet.
Therefore, a need has arisen for a method and system for identifying and resolving personal computer system problems which is accessible through a uniform mechanism regardless of the functional state of the operating system and other software.
A further need exists for a method and system which detects when an operating system has failed to boot and can take appropriate corrective actions.
A further need exists for a method and system which can detect when an operating system hangs and can then take appropriate actions.
A further need exists for a standard mechanism which can be invoked to attempt to resolve operating system failure to boot and operating system hang conditions.
In accordance with the present disclosure, a method and system is provided that substantially eliminates or reduces disadvantages and problems associated with previously developed methods and systems for resolving computer system problems. A monitoring system detects problems with a computer system and aides in identifying and resolving the problems. The current level of functionality of the computer system is determined, and technical support is provided for the computer system in accordance with the functionality of the computer system.
More specifically, a state machine monitors operating system functionality to detect computer system failures. A watchdog timer is initiated substantially simultaneous with computer system boot and cleared at a predetermined point of the computer system boot sequence. A computer system failure is determined to exist if the watchdog timer remains uncleared after a predetermined time period. For instance, the watchdog timer is cleared with an operating system service routine before expiration of the predetermined time period, thus indicating that the operating system has booted through the service routine point of the boot sequence within the predetermined time period. Failure to clear the watchdog timer with the service routine indicates failure of the boot process through the boot sequence point at which the service routine is called.
In one embodiment, a user initiates operating system monitoring by pressing a service button to indicate a problem with the computer system. The pressing of the service button initiates support functions, such as the initiation of a service application; this allows testing of the computer system by the monitoring system. Alternatively or in addition to the initiating of the watchdog timer associated with monitoring the boot through calling of the operating system, the service button initiates another watchdog timer that acts as a hang detection timer. If the service button is pressed during computer system boot, the hang detection timer is initiated at a predetermined point of the computer system boot sequence, such as after a user provides log in information, and is cleared upon initiation of the service application. An operating system hang-up error is identified if the hang detection timer remains uncleared after a predetermined hang detection time.
In one embodiment, detection of a computer failure results in a reboot of the computer system to a service mode. The service mode boots up a service mode operating system to enable analysis of the computer system even if the computer system""s primary operating system has failed. Initiation of a service mode boot also starts a watchdog timer. The watchdog timer is cleared at a predetermined point of the service mode operating system boot sequence. A computer system failure is determined to exist if the watchdog timer remains uncleared after a predetermined time period. If the service mode boot was initiated by a previous user press of the service button and ensuing fault detection, then a service mode hang detection timer monitors the service mode operating system boot sequence to detect any hang-up of the service mode operating system.
The present invention provides many important technical advantages. One important technical advantage is integrated support for detecting problems associated with computer systems. Monitoring a computer system boot sequence for hardware or operating system failure enables automation of problem detection and support for resolving the problems. Further, detection of an operating system failure allows analysis and correction of the computer system problem through use of the service mode operating system.
Another important technical advantage is the automatic confirmation that a problem exists with a computer system. Indication that the monitoring system has detected a problem provides, as a minimum, confirmation to technical support staff with reduced dependence on the verbal description of the computer system user. Problem confirmation limits the number of basic items that technical support staff need to check over the course of a telephone call. Further, if the monitoring system does not detect a problem, then technical support staff can limit the number of problems needing investigation. For instance, failure to detect a problem with the monitoring system indicates that the hardware and operating system have booted in a normal manner, and the system is capable of initiating the service application.
Another important technical advantage is the identification of the problem associated with the computer system. For instance, monitoring of the computer system boot allows identification of problems as associated with hardware or with the operating system, or alternatively may indicate proper hardware and operating system functionality that indicates user or application related difficulties. If operating system software is the problem, use of a service mode operating system supports full analysis to further identify and analyze the problem. For instance, if the main operating system is inoperable, then the service mode operating system supports computer system operation and allows operation of the computer system for automatic analysis and correction of the problem with the main operating system.
Another important technical advantage is a robust user interface that is simple and uncomplicated to use. For instance, a user with a question or problem simply pushes a single service button. Pressing the service button generates an interrupt directly into the chip set to, for instance, initiate a service application with hang detection monitoring. The direct interface of the service button to the chip set enhances reliability and simplicity. For instance, the user""s input to the service button does not have to rely on the operation of computer components, such as a keyboard or mouse. Further, once the service button is pressed, the computer system may perform an indepth analysis of potential problems, even when the operating system has failed, by using the service mode operating system to run computer components.