1. Field of the Invention
This invention relates generally to digital data processing systems and specifically to the on-line testing of processor elements within data processing systems. More particularly, it relates to the diagnostic testing of a processor element in a floating point accelerator portion of a data processing system. This testing is carried out without impairing the response or execution time of the system and adds virtually no cost to the system; it uses system resources that otherwise would be idle while the tests are running. In addition, live data is used to provide test sequence operands, to enhance detection of data-dependent errors.
2. Description of the Prior Art
A digital data processing system generally includes three basic elements; namely, a memory element, an input/output element, and a processor element, all interconnected by one or more busses. The memory element stores data in addressable storage locations. This data includes both operands and operations for processing the operands. The processor element causes data to be transferred to or fetched from the memory element, interprets the incoming data as either operations or operands, and processes the operands in accordance with the operation. The results are then stored in addressed locations in the memory element. An input/output element also communicates with the memory element and the processor element in order to transfer data into the system and to obtain the processed data from it. The input/output element normally operates in accordance with control information supplied to it by the processor element. The input/output element may include, for example, printers, teletypewriters or keyboards and video display terminals, and may also include secondary data storage devices such as disk drives or tape drives.
Data processing (i.e., computer) systems frequently utilize multiple processor elements to execute or supervise the various operations and tasks they are designed to perform. For example, separate processor elements are sometimes provided to carry out input/output operations, to control peripheral devices and to perform other separable tasks. Further, actual data processing functions also may be divided among multiple processors, as well. Sometimes a special type of processor element, termed a floating point accelerator, is provided for performing floating point arithmetic calculations. Floating point accelerators are specially designed to increase the speed with which floating point calculations may be performed; when a floating point operation is to be performed, it is executed in or by the floating point accelerator rather than in another processor.
Users and designers of data processing systems demand highly reliable and accurate operation. For this reason, error detecting and correcting mechanisms are provided throughout modern data processing systems. However, such mechanisms generally cannot detect or correct data which is erroneous but not logically corrupted. One place where data having these characteristics can originate is in the execution of arithmetic operations and, in particular, floating point operations. For this reason, it has long been a practice in the data processing industry for computer programmers to build into programs using floating point operations some steps for checking the results of such operations to be sure that those results fall within the range of numerical values in which correct results would occur. Thus, in a payroll calculation program for factory workers whose take home pay might be expected to fall within the predetermined range of $300-$500 per week, the program might be provided with instructions to check the calculation to be certain that no payroll check is written for more than some preset limit, such as the aforesaid $500. Of course, the input data used by the payroll program also could be checked similarly to verify that all parameters have values within expected ranges (insuring, for example, that no worker is paid for putting in an impossible two hundred hour work week). Once a floating point error is detected as having occurred, diagnostic measures must then be employed to analyze the error and locate its cause. If the cause is an intermittent or "soft" failure, this may be difficult to do.
Another approach to verification of floating point operations, usable with time-sharing systems, is to assign to one system "user" the task of doing nothing but running a sequence of floating point diagnostic operations all the time. To be effective, however, this technique generally requires that such operations be performed with known data so that the actual results may be compared against expected results. Some errors may be data-dependent, though, in which event the selected tests may not detect such errors unless the operand data is varied from time to time. Further, many erroneous floating point operations may be executed between the time a failure takes place in a floating point accelerator and the time the next diagnostic operation is run which is capable of detecting the failure. Indeed, so as not to increase significantly the overhead of operating the system and so as not to slow down the response time for other users, it is necessary and intended that the diagnostic operations occupy the system's processor elements only a small fraction of the time. But this ensures that floating point processor failures may cause erroneous results before being detected by the diagnostics.