The present invention relates to hardware and software development, and in particular to a method and system for testing error detection programs dedicated for detecting hardware failures in a computer system, in which error case patterns comprising stimuli values are generated and response patterns to the hardware are evaluated.
Computer systems often have a complex structure that comprises many components, on-chip or off-chip switched to be arranged for tight co-operation with each other. The complexity is even increased when more than one processor unit is coupled with each other or when they co-operate via an electronic network system.
A problem arises when any technical component has a failure which is not made fail-safe by a redundant component. Then, the system operation may be interrupted which—when applied for business purposes—causes high additional costs, in general.
In order to minimize the time required for repairing a hardware failure prior art software programs exist which control the operability and functionability of individual hardware units. Those units may comprise one or more of hardware subunits. In case of a hardware failure the failing technical component may be localized very quickly and technical stuff can replace that component by a new one within relatively short time without a longer time being required for localizing the failing component. Such error detection programs dedicated for detecting hardware failure are referred to in prior art as so-called field-replaceable-unit (FRU) isolation programs. As a matter of fact such programs are complex as well and thus need some time to be developed and be debugged in order to provide sufficient quality.
In prior art, those FRU isolation programs are tested and debugged in close co-operation with the new hardware itself after this has been fabricated at least on a prototype base.
This situation is depicted in FIG. 1: a service element (SE) as e.g. a laptop 10 having installed such FRU isolation program under development is connected via a LAN connection to a computer system 15 in which the newly developed hardware 16 is running.
A separate error generator computer 18 has its own user interface 17 and is arranged to force errors in form of so-called stimuli voltage levels in the hardware itself, forced on latch level for example. Thus, the hardware itself is stimulated with errors.
Then, the error latch status values resulting from this hardware stimulation are read by the service element 10 which runs a respective driver and are evaluated by the FRU isolation software under development.
Usually, after Initial Microprogram Load (IML) at the new hardware 16 the so-called shift-chain-technology is used in order to track a plurality of different paths within a chip. Within a chain a plurality of dedicated latches are tracked and the respective voltage level dependent of a respective time clock is recorded in a view file provided per chain. Thus, such status values can be evaluated and bugs in the FRU program can be found and corrected because the hardware can be selectively fed with a continued sequence of new error stimuli values. Thus, an iterative operation of    a) new hardware stimulation,    b) watching the error propagation effects thereof in the error latches,    c) controlling the FRU-isolation program if it is able to detect those errors,    d) repeated amendment of said FRU-isolation program,is the way in which said FRU isolation program is debugged in prior art comprising two different user interfaces interface 17 for generating errors and user interface 12 for evaluating errors and debugging the FRU-isolation software.
A first disadvantage thereof is that this is a time-consuming, laborious debug procedure.
A second disadvantage of said prior art approach is that the FRU-isolation program can be analyzed for debugging purposes not before the new hardware exists physically in form of at least a prototype base. As, however, the FRU software is an essential component of a new hardware in sale it should be in a releasable status when the hardware itself is releasable. This, however, is not possible in prior art.