As shown in prior art FIG. 1, it is known to provide a computer system 10 formed of a host computer 11 connected via a backup data/communication path 13 to a tape drive 12. To back up host system data, the tape drive 12 has a removable media 14, such as a data tape, which is insertable into the drive 12 via a loading slot 9. The drive 12 has tape read/write hardware 15 and an embedded specialized application computer 8 containing firmware stored in a firmware memory 16 and data stored in a data memory 17.
It is known that failures can occur in the data backed-up computer system 10 of FIG. 1. These failures can be of various types. For example, a malfunction may be caused by the tape read/write hardware, the firmware, data stored in the data, or errors may occur as a result of the data tape itself being damaged or having quality problems. Also, errors can occur if faulty instructions are sent from the host computer 11 to the tape drive 12.
The main issue when performing a failure analysis is a lack of proper and useful information concerning the problem. A vendor of the tape drive may spend a very significant amount of time investigating the root cause of different types of problems reported by the computer system with respect to the data backup tape drive. These problems may have to be recreated by the tape drive vendor at the vendor's own laboratory at a location from the host computer system being backed up with the tape drive. The host system may have to be simulated in a laboratory remote from the host computer system in order to catch the failure mode and events leading up to the failure. In many situations, different types of debugging tools must be provided and prepared at the time the failure occurs. This may result in the vendor of the tape drive spending inordinate amounts of time at the computer system user's location while the system is operating, which may interfere with operation of the host computer. Also extensive time may be required at the remote vendor's lab where the tape drive/host computer system is simulated in order to attempt to recreate the problems which have occurred. This process may take a long time and several retries before the correct information is trapped.
The tape drive vendor normally allocates people from its development laboratory for the failure analysis after it receives back the tape drive from the host computer system user's location.
In the prior art, previous experience was used by those in the laboratory combined with information dumped from the firmware memory and data memory of the returned tape drive in an attempt to solve the failure problem. However, information learned by such a memory dump would be stale and difficult to analyze since the information stored in the memory after the drive is returned is after-the-fact information, which has been rewritten, such as overwriting in buffer memories.
Based on previous experience, it was known in the prior art to solve problems occurring in the tape drives with the implemented embedded firmware systems along with development of comprehensive debugging tools based on event traces and different logs. This together with complete mapping of the firmware (both code and data), plus complete access to all hardware registers provided observability of what may have caused the problem. However, as explained above, the dump typically would not provide information stored in the memory at the actual time of the failure occurrence since as is known in the art, portions of a memory useful for the failure analysis dump are rewritten during continued operation of the tape drive after the error has occurred. Thus, since the people in the vendor's laboratory attempting to solve the failure problem are at a remote distance from the people where the actual problem occurred—namely the backed-up computer system user—valuable information is lost in view of the operating time which occurs after the failure and prior to transport of the tape drive to the vendor's laboratory for failure analysis.
Typically in the prior art, the tape drive vendor laboratory would have to begin from scratch in an attempt to simulate the host computer environment in which the tape drive failed at the user's location.
Furthermore, the backed-up computer system user is focused on doing their job and not understanding problems with the peripheral backup tape drive unit. Although the end users may be very accommodating to perform simple tasks in order to provide help, they do not like to interrupt their organization's use of the computer system for a very long time. For example, rebooting the computer system server in order to prepare for debugging tools is not a welcomed operation by the backed-up computer system user.
As shown in FIG. 2, with the prior art system it was thus first necessary, as shown at step 18, to recreate the failed system with the drive and either the host computer system or a simulation of the host computer system. Once the failure mode occurred, then it was necessary, as shown at 19, to dump and collect the dumped information from the firmware memory and data memory of the embedded computer 8. Thereafter the available information was analyzed as shown at 20, and then actions and/or improvements to the drive were implemented as shown at 21.