The present invention relates to a method and apparatus for testing a digital computer having, inter alia, a processor capable of executing instructions and a memory for storing the instructions to be executed by the processor. More particularly, the invention concerns a method and apparatus for checking the respective states of the control signals produced by the processor upon fetching and/or execution of each instruction.
Digital computers, which are realized in integrated circuit form, may be tested in one of two ways;
(1) Off-line, non-concurrent testing, which may be performed either initially, or periodically. This testing requires taking the processor out of service, comprehensively testing it, and finally returning it to service. PA1 (2) On-line, concurrent testing; that is, real time monitoring of processor operation while it is performing actual applications. PA1 (a) detecting when an operation code for an instruction to be executed by the processor is fetched from the memory; PA1 (b) monitoring the control signals, which appear at the control signal outputs of the processor, during the fetching and/or execution of each instruction and producing a first, so-called "signature code"; that is, a unique code which corresponds each unique set of control signals; PA1 (c) receiving the operation code of the instruction to be executed by the processor and producing a second signature code which corresponds to what the first signature code would be if the processor were operating properly; and PA1 (d) comparing the first and second signature codes and producing an output signal indicative of error if they are unequal. PA1 (a) a control device for detecting when an operation code for an instruction to be executed by the processor is, or is to be, fetched from the memory; PA1 (b) a first device, such as a parallel loaded linear feedback shift register ("LFSR"), connected to the control signal outputs of the processor and responsive to the control device, for producing a first signature code in dependence upon the control signals; PA1 (c) a second device, such as a read-only memory (ROM) or programmable logic array (PLA), coupled to the memory and responsive to the control device, for producing a second signature code in dependence upon the operation code of the instruction to be executed by the processor; and PA1 (d) a third device, such as an equal comparator and connected to the first and second devices, for comparing the first and second signature codes and producing an output signal indicative of error if they are unequal.
The time to comprehensively test a device in an off-line non-concurrent mode increases as the number of gates, G within a device increases. The test time has been emperically shown to be proportional to G.sup.x, where x depends on logic structure and 1&lt;x&lt;2. Furthermore, given a constant percentage of fault coverage, the number of untested cells grows linearly with G. For example, a 99% test of a 1,000 logic cell device leaves 10 cells untested, whereas a 99% test of a 50,000 cell device (e.g., a 16 bit microprocessor) leaves 500 cells untested and takes between 50 and 2,500 times as long to test. In addition, the probability of "soft" or intermittent failures--due to whatever cause--within the device increases with increasing cell density.
At this point in time, the so-called 8 bit generation of microprocessors represents a stable and mature technology that has been used in many products for several years. State of the art products are now being designed with 16 bit microprocessors. The production, installation and maturing of products of this technology may now be considered to be in late childhood to early adolescence. The generation beyond the 16 bit processor is also emerging with the recent development of 32 bit integrated circuits.
The differences between 8, 16 and 32 bit microprocessors go much deeper than a comparison of the width of their data buses. It is the level of integration of functions and processing capabilities that are included within the device coupled with the number, size and density of internal logic cells that forms the real basis of comparison.
As we move from 8 bit microprocessor technology, through 16 bit, and on to 32 bit technology, it becomes more and more desirable to perform on-line testing, and less and less desirable to conduct off-line or periodically scheduled tests of these devices. This is because comprehensive off-line test procedures become too long and would steal or use too much processing time. Furthermore, the nature of soft failures indicates that if the processor passes an off-line test, there is no guarantee that it will not be subject to a soft failure upon its return to service. Conversely, a soft failure could occur during off-line testing that would indicate that a good device is bad.
The classical solution to on-line testing is to add redundant systems, where the results of two or more processors or processor boards or sub-systems are compared. However, just as device manufacturers cannot justify expenditures in silicon acreage above 10 to 15% for fault tolerance and testability, multiple redundancy at the board or sub-system level can only be economically justified in critical applications.
Thus, it would be desirable to provide a method and apparatus which effects some reasonable degree of on-line or concurrent verification coupled with a comprehensive off-line test capability for manufacturing and field repair testing. Ideally, the apparatus should ultimately be packaged in at best one, and certainly no more two, relatively inexpensive integrated circuits. It should interface in a simple and straightforward manner to the microprocessor signals and buses and it should neither compromise system performance nor unreasonably place extra burden on designers and programmers by its presence. In summary, a desirable solution would be one or two low cost chips that may be directly coupled to the microprocessor to be tested in a completely system transparent fashion.
While systems of this general type for on-line, concurrent self testing of microprocessors are known, prior systems are normally incapable of testing a so-called "test kernal" of the computer under test; that is, the portion of the computer that must be fault free in order for the self-test to operate in a meaningful fashion. The test kernal includes not only the microprocessor itself, but read only memory (ROM) that contains the self-test program, and the intra-board data, address and control buses, bus drivers and multiplexers. Faults within the kernal lead to erroneous and unpredictable system behavior and render the self-test useless. In a complete system, kernal faults are potentially dangerous in that they cause the microprocessor to follow a sequence of events that may be a radical departure from that specified by its program. Thus, they can lead to such dangerous and disastrous situations as the destruction of valuable memory files, invalidation of secure data in communications systems, or, in the case of control systems, causing physical harm to operators, machinery or processes.