1. Field of the Invention
This invention relates to validation testing of large electronic systems with embedded software and, more particularly, to a method of automatically generating test equipment control programs which sequentially test the most likely failure cases for newly developed systems.
2. History of the Prior Art
The testing of large electronic systems with embedded software is a costly and time consuming operation. The main goal of such testing is essentially risk management. System quality is defined in terms of the impact of system faults or errors on the expected behavior of the system. Measured system behavior is contrasted with the expected system behavior as defined in the user's requirements specification. Traditionally, the cost of such testing is high because of the requirement for expensive test plants and the time needed to manually run the test cases.
The development effort for large electronic systems with embedded software is organized around various phases of the development process. Generally, the development process is partitioned into a number of serially occurring, basically mutually exclusive phases. The scope of activity that occurs at each phase varies depending upon whether the software system is entirely new or is a modification to an ongoing and evolving system.
During an initial phase, which might be characterized as a "conceptualization" phase, generic requirements are produced wherein the high-level functionality of the overall system is specified. It is during this phase that the ultimate users of the system elucidate their requirements and objectives to the system analysts. Various alternative solutions aimed at achieving the specified objectives may be proposed, and the analysts select the most viable solution and then generate the requirements.
A second phase, the "implementation" phase, commences when the requirements are further analyzed and parsed to obtain a refined system definition. The system is now viewed as comprising an arrangement of stand-alone, but interdependent modules. These modules are usually developed separately by different groups or individuals. The development effort at this juncture comprises coding of the modules in a specific computer language, such as the "C" language, and then testing the execution of the individual modules.
A third phase, called "integration", is initiated by combining appropriate modules into their respective subsystems, thereby forming the overall system. Subsystem testing may also be effected to ensure that the modules are properly linked and compatible.
A fourth phase, called "system test", begins when the overall system is handed off to the testers for validation testing. The testers have the task of trouble shooting the integrated system before its release to the end users. It is well established that the cost of correcting software defects after a software system has reached the end user is significantly greater than correcting defects found before the system reaches that stage. This has led to an increased emphasis on reduction of errors during system development, primarily by testing. The objective of the testers, therefore, is to locate as many potential failures as possible. A "failure" is a discrepancy between what was intended to be implemented and what the system actually does as revealed through testing. In a large system, system testers are faced with the problem of how to choose test cases effectively given practical limitations on the number of test cases that can be selected and tested.
The ability to determine the true quality of a system through traditional system testing is hindered by human language limitations and software characteristics. System requirements are written in human language, which is not mathematical, and therefore often has internal contradictions and a lack of completeness. The quality determination is also hindered by software characteristics which make it difficult to apply well known principles of quality control used broadly in industry for hardware quality testing. For example, since program code shows no degradation over time, it does not fit into the normal concept of mean-time-between-failures as used with hardware components.
Additionally, control of the process during conventional software development tends to be somewhat subjective in nature because, unlike the case of traditional hardware development, there are no sophisticated, objective control procedures that pervade or encompass all phases of the development. In effect, there are no universally applicable methods or techniques that can quickly and accurately perform detailed measurements and report on results within acceptable time limits and resource usage.
As noted above, during validation testing, the objective of the testers is to locate as many potential failures as possible. In very large systems, where operating history affects current system behavior, and where system functionality is very complex, the number of possible uses and system responses is very large and makes it impossible to run all possible cases. Thus it is not possible to prove that a large and complex system is entirely free of errors. Measurements representing the percentage of the functionality covered in a test set, i.e., Coverture Grade, have been defined according to different concepts related to software implementation, such as the number of internal states, or pathways therebetween, in the control flow used during the test.
Because it is not possible to run all possible test cases, economic factors normally determine the end of the testing cycle. Several attempts to introduce Stop criteria based on the statistical analysis of the test results have been proposed, but they have not been introduced on an industrial scale because of inherent inaccuracies.
The selection of test cases is another important consideration which impacts the efficiency of tests in finding system errors. Currently, test cases are selected either from implementation parameters, or from the tester's own personal experience. When selected from implementation parameters, the test case runs the program through all the states or branches in the program, and in this instance, it is possible the test generation and execution process.
The selection of test cases from implementation parameters suffers from a lack of consistency between the implementation parameters and the actual future use of the system. Such is the case, for example, when all the branches are tested at least once (i.e., 100% test coverture). Because of program loops and data corruption, it could be necessary to go through a combination of branches a number of times to get an unexpected result. Thus, just because all of the branches are tested at least once, this does not guarantee the detection of all the errors. Therefore estimations of system quality derived from such test cases are often inaccurate. On the other hand, the selection of test cases from the personal experience of the tester suffers from a lack of consistency from one test case to another, resulting in a quality estimation of unknown accuracy. Additionally, testing on the basis of personal experience does not provide a systematic method for automating test cases. In either event, the test must still be performed manually or with automated equipment which is manually programmed.
Some testers have turned to Usage Models in an attempt to select more meaningful test cases for validation testing. FIG. 1 is a high level functional block diagram illustrating the utilization of a Usage Model for validation testing in an existing system. A software CASE tool, such as the Software Usage Modeling and Integrated Testing (SUMIT) tool, is capable of automating or partially automating some of the steps in the validation process. The role of Usage Models in validation testing is described in "Software Certification: An Engineers Guide to Preparing and Performing Statistical Tests for Software," by Software Engineering Technology, Inc., Sep. 24, 1993.
Still referring to FIG. 1, the inputs to the validation process 1 for a given software system 10 are the specification document 11 for the software, the software's input domain 12, and some knowledge about expected interaction from the user community 13 with the software. The system specification 11 and the input domain 12 are used as the basis to construct a usage model structure 14 that describes the interaction from the user community 13. The usage model structure 14, along with pertinent data collected from the user community 13, is used to estimate usage model statistics 15. The usage model structure 14 and the usage model statistics 15 are then combined, and the combined usage model is subjected to rigorous analysis 16 to determine its ability to accurately characterize actual usage patterns.
Markov chain theory, described in more detail below, is used to derive a finite state machine and determine the probabilities of transitioning between the various states as an input to the usage model analysis 16. The analysis 16 is iteratively performed until an acceptable model is obtained. Such a model is called a verified usage model 18. From the verified usage model 18, sample usage sequences 19 are generated and then converted into test cases 21. The test cases 21 are executed on the software 22 during test execution 23 in order to achieve a certified software system 10.
When Usage Models are utilized for the selection of test cases, an enormous number of possible uses results. Since, for economic reasons all possible uses cannot be tested, random samples may be tested until reaching a preset statistical limit. In order to provide sufficient precision, however, the sample size must be larger than the test set size generated by current methods.
Other software testing methods have also been developed as shown in the following U.S. patents.
U.S. Pat. No. 5,293,323 to Doskocil et al. discloses a method of fault diagnosis during system operation using a process called Diagnostics by Confidence Measure Assessment (DCMA). A confidence measure is provided for each system test failure assessment, both by repeated testing of a single source and corroboration processing of many test results from different sources. Since this persistent measurement testing focuses on a known fault, it is not suited for use in broad validation testing of newly developed systems.
U.S. Pat. No. 4,991,176 to Dahbura et al. discloses a test method in which a system implementation is modelled as a finite state machine, and a minimum cost test sequence is generated to cover every state transition of the finite state machine. However, modelling of the system implementation does not enable the identification and testing of those failure cases which are most likely to occur during system use. Therefore, those failures which most greatly affect the system reliability and failure rate may not be tested, and an inconsistent and possibly inaccurate test result is achieved.
U.S. Pat. No. 4,870,575 to Rutenberg discloses a test method which performs a fault tree analysis with respect to the contents of a dynamic "stack of contradiction parameters" and then superimposes modified hardware and software fault trees onto each other. Thus, Rutenberg is designed to specifically test hardware-software interactions. Moreover, Rutenberg, like Dahbura above, utilizes a test method in which a system implementation is modelled rather than the system usage. Therefore, Rutenberg suffers from the same shortcoming as Dahbura, i.e., those failures which most greatly affect the system reliability and failure rate may not be tested, and a less than optimum test result is achieved.
U.S. Pat. No. 5,272,704 to Tong et al. discloses a method of generating nodes and branches of a diagnostic tree using a candidate generator, constraint propagator and best measurement generator, along with a model of a system to be diagnosed. Once again, Tong utilizes a test method in which a system implementation is modelled rather than the system usage. Therefore, Tong also focuses on diagnosing a known fault and suffers from the shortcoming of potentially failing to test those failures which most greatly affect the system reliability and failure rate. Thus, Tong also provides a less than optimum test result.
It would be a distinct advantage to have a validation test method which formalizes the expected use of the system, defines the system behavior, and introduces statistical measurements to identify the most likely system failures during actual system use, and automatically generates control programs which direct associated test equipment to sequentially test the predicted system failures in the order in which they are most likely to occur.