This invention relates generally to digital data processing systems and methods and, more particularly, to methods and apparatus for ameliorating the adverse effects of data processing system malfunctions.
As computer operating systems have grown more complex and feature-rich, they have also grown significantly larger. In order to more effectively manage these large software systems the functionality of the operating system has been partitioned into separate modules or components. The underlying rationale in partitioning the operating system is that dividing up a complex software system into smaller pieces makes the operating system easier to develop and maintain. While this may be true to a first order, the fact is that personal computer (PC)-based operating systems such as, but not limited to, Microsoft Windows 95(trademark), Microsoft Windows 98(trademark), and Microsoft Windows 2000(trademark) all experience xe2x80x9ccrashesxe2x80x9d or xe2x80x9changsxe2x80x9d, requiring the user to reboot the PC, or in some cases to cycle the power off and then on in order to restore the PC to an operational status.
It should be noted that not all of these problems are the fault of the operating system, and may instead actually be the fault of a running application. For example, the applications programmer may have forgotten to test a program path, or a program path was never traversed during testing so that some problem did not appear. In some cases, faults in the application may simply be the result of poor programming, or the failure to check the success or failure of a particular function call. Whatever the reason, the result is that the user""s PC can become unresponsive or non-responsive and effectively unusable until the system is rebooted or powered off and then back on. As many users have experienced, such problems typically occur at the most inopportune times, such as when a document or a spreadsheet must be quickly completed in order to make a deadline. Furthermore, the occurrence of these types of system failure events is unpredictable, and may occur at any time during use of the operating system or application.
One common failure mode is that a program is xe2x80x9cleakingxe2x80x9d memory, i.e., gradually using up small amounts of memory without releasing the used portions. This results in physical RAM memory capacity being slowly consumed until there is no more available. This type of failure mode typically causes the operating system to stop responding to user input such as keyboard strokes or mouse clicks. Because this type of failure occurs slowly over time, the user may have been working on a document or presentation for quite a while, and may have made many changes or additions to the work in progress. When the system stops responding, the user will be unable to save the work in progress, resulting in a high probability that the work in progress will be lost. In order to restore the system to a functional state the user is typically required to reboot or power cycle the system, resulting in the loss of all or at least some edits to the work in progress being lost.
Another less common problem occurs when a program consumes most or all of the computer""s processing capability, thereby starving other programs of computer time for proper execution. In this situation, the system may not respond to the user""s mouse movements and certain keyboard commands, including the command to restart the system, requiring the user power cycle the system in order to restore the system to a functional state.
As can be appreciated, a need exists to lessen and ideally avoid the negative impact of such system failures on the user""s work environment.
The foregoing and other problems are overcome by methods and apparatus in accordance with embodiments of this invention. The teachings of this invention provide methods and apparatus to enable an orderly shutdown of a malfunctioning computer system, while saving user-desired work in progress.
In one aspect this invention provides a software program and software device driver mechanism that is installed on a computer system to be protected. In accordance with these teachings of this invention the software program xe2x80x9chooksxe2x80x9d or connects or links to the operating mechanism that is the single point of entry for a file open request. When a program makes a request to open a file, the request is detected, and the name of the requesting program is identified and saved along with other information, including the name of any file or files that the program is requesting to open. The data is saved as a protected database in a protected area of system memory, preferably but not necessarily a local disk drive.
The software program also xe2x80x9chooksxe2x80x9d or connects or links to the single entry point for a file close function. Each time a program closes a file the close request is intercepted. The name of program closing the file is detected, along with other information, including the name of the file or files to be closed. The files are closed normally, and the protected database updated with the new information.
Each time a file is opened or closed, the protected program/file database on the disk is updated, as well as a copy stored in computer memory (RAM). When the system stops responding to user input, the offending program may still have one or more files open, and may be unable to respond to any user request to properly close the program. When this occurs, the user invokes the emergency shutdown procedure in accordance with these teachings by using a predetermined xe2x80x9cbuttonxe2x80x9d. The button may be located on the keyboard, or it may be mounted internally or externally from the computer system. The button may be implemented using a conventional type of pushbutton or other type of switch, or it may be implemented using some predetermined keyboard key sequence where the user depresses one or more keys, either sequentially or together. The special key combination is detected, and is used to invoke the operation of the orderly shutdown procedure in accordance with these teachings.
In operation, the orderly shutdown procedure enumerates the list of currently running programs and then by using a program state table determines which program or programs are no longer responding to user input. In accordance with a further aspect of these teachings, the resulting list of non-responding programs is compared to a list of non-responsive programs, i.e., a list of programs that are normally in a non-responsive state, to determine which user program or programs are not responding. The list of non-responsive programs preferably includes those programs that are part of the operating system and that normally run in a suspended or xe2x80x9cnot respondingxe2x80x9d mode.
After the identities of the user non-responding programs are determined, the orderly shutdown procedure attempts to read from the protected database to determine if any of the non-responding user programs have any files open. If the disk is not accessible, the information is read from the copy stored in the memory-resident database. The orderly shutdown procedure matches each non-responding program to any open file or files that it may have, copies the identified file or files to a temporary space on the local or a network disk while identifying these files with, by example, a date and time stamp, as well as the name of the non-responding program. When the open file copy procedure terminates, the orderly shutdown procedure may display a message box indicating that any open files have been saved, and that it is safe for the user to shutdown the system.
After the computer system is rebooted, the user locates the saved version of the file or files in the protected area and may then resume working at the point where work stopped.
In accordance with these teachings a method is disclosed for operating a digital data processing system, as is a digital data processing system that operates in accordance with the method.
The method includes steps of (A) detecting an activation of a user input that indicates that the system, or a program executed by the system, has become non-responsive to a user; (B) determining an identification of any currently open files and programs with which currently open files are associated; (C) determining an identification of those programs that are normally not in a non-responsive state; and (D) saving those currently open files that are associated with programs that are identified as being not normally in the non-responsive state. A next step notifies the user that any currently open files that are associated with programs identified as being not normally in the non-responsive state have been saved. In a further step the user may restart the digital data processing system, and retrieve at least one of the saved files.
The step of detecting can be executed in response to the user manually activating a switch or by the user activating one or more keyboard keys.
The step of saving saves in association with the currently open files an identification of their associated programs, and saves the currently open files in a data storage device that forms a part of the system, and/or in a data storage device that is remote from the system, and that is reached through a data communication network.
The step of determining an identification of any currently open files preferably includes the preliminary step of monitoring system file open and file close operations, and maintaining a record of those files that are currently open and a record of a program that opened the file.