In the field of computer system testing, it is generally desirable to develop tools for reliably identifying defects in computer systems pertaining to both hardware and software operation. One prior art approach involves generating pseudo-random code to run on the system being tested and comparing emulated results for this test code with results actually obtained on the computer system being tested. Where there is a discrepancy between emulated and actual results, the test program generally flags the existence of a possible defect as a consequence of the discrepancy. However, the discrepancy between the emulated and actual results and any other information collected during the first failure may be insufficient to identify the cause of the suspected defect. More information may need to be collected to narrow the search for the cause of the defect, which can be accomplished only if the test program can repeat the defective behavior. If the defective behavior cannot be repeated, it may be very difficult to determine the cause of the defect.
Generally, a combination of an initial system state and the accumulated effect of a sequence of instructions may operate to exercise a defect. Accordingly, in the prior art, considerable effort is generally expended to carefully recreate the initial system state and to repeat, as precisely as possible the test program stimulus to the system once the initial state is established. This approach is generally both computationally demanding and time consuming.
Generally, once a defect is flagged, traces may be collected employing specialized hardware to determine the cause of the defect. Traces generally involve collecting data pertaining to a number of different system conditions in order to track down the cause of the defect. However, collecting such traces employing a mechanical connection or other hardware arrangement is generally tedious, time consuming, expensive, unreliable, and limited to physically accessible signals.
In the prior art, repeatable system behavior may be determined by comparing the clock cycle by clock cycle behavior within each system component and on each of the system buses connecting the system components within the same timing window in one or more instances of execution of a test program. Where the described clock cycle by clock cycle behavior is identical between the two runs, or instances of test program execution, a determination may be made that there is system repeatability. However, comparing clock cycle by clock cycle behavior for all the system components and buses connecting these components is tedious, computationally expensive, impractical, time consuming, and limited to physically accessible signals. An approach for determining system repeatability is needed which is less demanding.
In the prior art, built-in chip self test has used a technique to condense output vector information. In built-in self-test some means is generally provided to generate a fixed input vector test sequence. A short linear feedback shift register is generally used to calculate a syndrome for the output vector. If the resulting syndrome matches a fixed previously calculated expected syndrome then the chip generally passes the built-in self-test.
Therefore, it is a problem in the art that determining system repeatability by comparing clock cycle by clock cycle behavior for all system components and buses is tedious, time consuming, and impractical.
It is a further problem in the art that to obtain useful information about a defect, an initial system state must generally be carefully duplicated in order to cause the defective behavior to reoccur.
It is a still further problem in the art that recreating an initial system state is difficult and time consuming.
These and other objects, features, and technical advantages are achieved by a system and method which employs a plurality of counters to represent the system behavior. Preferably, a strategically selected group of operating parameters is tracked by a plurality of counters thereby establishing a counter state to represent the system behavior. Preferably, repeatability of the counter state practically assures repeatability of the system behavior. Preferably, substantial economy of human and computational effort is obtained by condensing system information into a group of carefully selected numerical system parameters which are tracked by a set of counters. The counters may be used to practically measure the degree of system repeatability. In a system with at least occasional system repeatability, the inventive mechanism may be further employed to find defects in system behavior.
In a preferred embodiment, the inventive mechanism repeatedly runs a varying test program on a computer system which operates to compare emulated and actual system results for a series of runs of the varying test program in order to identify program runs in which the emulated and actual system results differ, thereby indicating the existence of a possible defect. Where there is a discrepancy between the emulated and actual program results, the inventive mechanism preferably flags a possible defect and attempts to recreate the failure in order to collect more information about the possible defect. Thus, the inventive mechanism preferably first establishes a mechanism for practically assuring that a set of counters effectively represents the system behavior. Thereafter, the counters are preferably relied upon to effectively represent the system behavior. Accordingly, when defective behavior is encountered during operation of a test program, the counters will preferably be employed to indicate whether this defective behavior is repeatable during subsequent runs of the test program. Use of the counters preferably provides considerable computational economy over the use of clock cycle by clock cycle behavior for the purpose identifying repeatable system conditions including defective behavior.
In a preferred embodiment of the present invention, a single test program may be varied by employing a pseudo-random number seed which varies the values of data, data storage locations accessed, types of access, and program branching decisions within the test program. Successive runs of the test program are preferably continuously varied by employing a new seed for each run of the test program. The inventive mechanism preferably records the seeds employed so as to enable a repetition of a particular sequence of test program runs where circumstances warrant.
In a preferred embodiment, where a discrepancy is detected between emulated and actual test program results, the inventive mechanism will preferably attempt to reproduce the discrepancy to determine whether a persistent defect exists, or whether a transient effect produced the failure. Generally, even a persistent defect may not reappear merely by re-executing the failing run in the sequence since many program runs may have contributed to the initial system state prior to the failing run. Accordingly, the inventive mechanism may select a seed farther back in the history of test program runs as the starting seed to initiate a sequence of further test program runs in order to reproduce the failure.
In a preferred embodiment, where the inventive mechanism identifies a test program associated with a particular seed where a failure occurs repeatedly, further analysis may be conducted to track down the defect causing the failure. A mechanism may be provided so that samples of selected internal system node values may be sampled at selected clock cycles during successive instances of execution of the failing run in order to build a trace of system activity. Such a trace may be very helpful in providing information to track down and correct a system defect.
In a preferred embodiment, the inventive mechanism employs a set of counters measuring a variety of system parameters in order to assess the system behavior during a test program run. The type and number of the system parameters measured by the counters are preferably selected, based on empirical data, to assure that repeatability of counter measurements practically assures repeatability of system behavior. The number and type of system parameters needed to measure system repeatability effectively may vary depending on the test program and on the system configuration. In one exemplary system configuration and test program combination, eight counters were found to be sufficient to measure system repeatability effectively. It will be appreciated that fewer or more than eight counters may be employed where other system configurations and other test programs are employed. The inventive mechanism may be employed to find functional bugs as well electrical bugs. Generally, electrical bugs are apt to be more transient due to exercising metastable behavior and are therefore less repeatable than functional bugs. The inventive mechanism may be employed in systems with inherent probabilistic non-repeatability such as systems containing metastable bus synchronizer between independent clock domains.
In an alternative embodiment of the present invention, the repeatability measurement may be used to determine whether a test program is repeatable. Preferably, the test program is run on a system which has already been found to be reasonably repeatable in order to determine now repeatable the test program is. In this case, the system is being treated as correct, and the program is being tested for repeatability. This approach may be employed to improve the accuracy and reliability of the test program software.
Therefore, it is an advantage of a preferred embodiment of the present invention that system information is condensed into a manageable number of parameters logged in counters to practically measure repeatability of the system behavior.
It is a further advantage of a preferred embodiment of the present invention that repeatability of a failure during a particular test program run may be enabled by re-executing a selected number of test program runs preceding the failing run.
It is a still further advantage of a preferred embodiment of the present invention that the initial state of the system need not be fully recreated in order to achieve repeatability.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. The novel features which are believed to be characteristic of the invention, both as to its organization and method of operation, together with further objects and advantages will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is given for the purpose of illustration and description only and is not intended as a definition of the limits of the present invention.