1. Field of the Invention
The invention relates to a client/server feature providing an emergency diagnosis and repair facility between client and repair server devices. More specifically, the system relates to a process for detecting a software malfunction in a client computer and connecting to a server to repair the malfunction.
2. Description of the Related Art
A personal video device (for example, as manufactured by TiVo, Inc. of Alviso, Calif.) is a complicated product, containing both hardware and software components. Individual devices occasionally suffer from hardware failures, and need to be returned to the factory for repair. These hardware failures tend to be sporadic and random in their occurrence, and (because software is not involved) there is no way to repair the failure other than to send a technician out to fix the hardware.
Software failures can also present a troubling problem. When a software failure occurs, it can potentially affect large number of devices because the same software is running in thousands of systems. A software failure may occur in what looks like a random fashion. For example, the xe2x80x9cnines problemxe2x80x9d (bad DMA data) tends to occur in a random-like fashion because it is dependent on many timing and usage sensitive conditions in an individual device. On the other hand, a software failure may occur en masse if it results from a latent bug in the software being triggered by data sent to the device.
Some software failures simply crash the machine, or cause the application to malfunction in an annoying but non-critical way. Other software failures can xe2x80x9cpoisonxe2x80x9d the device, damaging its stored data in a way which prevents the device from functioning at all.
When a software failure occurs and a device becomes nonfunctional, it is often fairly easy to correct the problem and return the failed device to normal use, if and when the service facility can run a repair program of some sort to correct the damaged data. The hard part is being able to run the repair program on the affected device. Currently, it is difficult or impossible to run any sort of repair program on a device if the device is unable to make a normal daily (program guide) phone call to the personal video service. As a result, when this sort of failure occurs, the service facility must issue an RMA, return the device to the factory for repair, and ship a replacement device to the customer. This is expensive when it occurs occasionally. It could be fatally damaging to a company if the company had to do it en masse due to a software failure affecting tens of thousands of devices.
Fault detection and recovery for a software crash is a common issue among software based machinery. A. Federico, Control Fault Detection for Machine Recovery and Diagnostics Prior to Malfunction, U.S. Pat. No. 4,514,846, describes a process for monitoring software crashes and preventing them from halting the productivity of the machine. However, this type of fault prevention is combined within the machine and thus has no means of using an outside source to repair any software failures. Because Federico""s machine is self-contained, this technique has no way of learning from its mistakes, nor does it offer a way to prevent these malfunctions from occurring in other machines performing the same functions.
D. C. Cromer, Data Processing System and Method for Generating a Detailed Repair Request for a Remote Client Computer System, Pat. No. 6,003,081, describes a technique for detecting a fault, connecting to a remote server, and informing the remote server of the fault. However it does not go ahead and fix the fault. In a complex device, such as a personal video device it is very important to repair the detected software failures as soon as possible to ensure a continuously working system.
The invention disclosed herein provides a way of repairing software failures in the field. It enhances the current software architecture of the device, by giving the device the ability to xe2x80x9cphone home for helpxe2x80x9d during its boot-and-startup process. A special diagnostic server located at the service facility takes control of the device, performs diagnostics, retrieves log information, and downloads and executes software to repair whatever has gone wrong. This greatly reduces the number of system components which must be working correctly for the invention to function.
The basic philosophy of operation is simple: it phones home, turns over control of the device to a diagnostic server located at the service facility, and executes commands issued by the server. Once the connection to the server is established, all of the intelligence in the diagnosis and repair process is driven by the server. This allows the service facility to react to newly-discovered problems, and enable the diagnostic server to identify and repair them, without having to change any of the software in the device.
The existing device architecture assumes that the devices almost always work correctly. If a device malfunctions in any persistent way, it is difficult for the manufacturer to offer any remedy to its owner other than to box it up, and send it back to the manufacturer for repair or replacement.
Accordingly, it is an object of the invention to provide a reliable running device by detecting and repairing software errors as they occur on this device.
It is another object of the present invention to provide a client computer for running software where any detected software errors are logged as error messages to be repaired.
It is still another object of the present invention to provide a dial up modem connection between a client computer and a server to fix any software errors found on the client computer. These software errors are logged when they are detected and this logged information is uploaded to the server. The server is then able to take control of the client computer to download and run repair-scripts on the client computer during boot up to repair the software errors.
The invention offers several advantages over previously known repair architecture, particularly, software failure detection and repair techniques performed by connecting to a remote server. This repair architecture is improved by providing a simple way for allowing a server to take control over a client computer to repair software errors found on the client computer.