1. Field of the Invention
The present invention relates generally to digital computers that are resistant to otherwise disrupting events such as the occurrence of nuclear radiation. More specifically, the present invention relates to an appartus for circumventing the effects of such a disrupting event without increasing the cost of software associated with operation of the computer and without affecting the data throughput performance of the computer.
2. Prior Art
There are numerous apparatus and techniques known in the art which provide means for circumventing the detrimental effects of a disrupting event, such as nuclear radiation, on the operation of a digital computer. Irrespective of the circumvention technique utilized, the circumvention mode enables a digital computer to resume normal operations after being interrupted temporarily by a set of environmental conditions which exceed the design operating limits of the computer circuits. Hence, circumvention techniques permit circumvention of potentially hazardous conditions by means of an interruption and a following resumption of the normal computer modes of operation. Ordinarily such an interruption is initiated by a signal from a detector that is sensitive to the particular environmental conditions to be circumvented. The action taking place in the computer during the interrupt state is simply a wait operation which lasts until the adverse operating conditions produced by the hostile environment no longer exist. A recovery mode is then initiated to return the computer to a normal mode of operation. The functions performed during the interrupt state are controlled by signals from such a detector, while the functions performed during the recovery mode are controlled predominantly by software using data read from a computer's memory unit. During the recovery mode, a legitimate or valid starting point or roll-back point previously established is utilized in the recovery process by the computer.
It is generally well known that memory devices used in a computer capable of circumventing a hostile environment such as nuclear radiation must be capable of inhibiting extraneous signals that might otherwise cause stored data to be altered. These extraneous signals must be inhibited from affecting the contents of memory cells even if the normal reading and writing circuits exhibit faulty behavior due to environmental stress. However, while it is a requirement to prevent damage to static stored data, present component technology does not encompass a memory device which can be used to prevent the loss of data being written at the time of the environmental interference.
It is also well known that to permit circumvention of such environmental interference, at least one valid starting point must exist at all times during the normal operation of the object program. This valid starting point is a requirement which enables the recovery routine, subsequent to the interference, to conduct a return to the object program and thereafter maintain the functional integrity of the computer system. Unfortunately, the inability in present technology to prevent the loss of data being written at the time of the interference makes it difficult to establish a valid starting point without substantially and adversely affecting the hardware costs or the software costs or the performance or some combination of these factors of a computer capable of such circumvention. As used herein, the "interference" and the "circumvention" thereof are substantially concurrent, cause and effect related events.
One common method of circumventing interference events is by way of a software circumvention program. A software circumvention approach also defines roll-back points. In a circumvention, the program returns to the latest roll-back point and restarts computer operation from that point, subsequent to the interference. The manner in which software circumvention is implemented can be illustrated by the simple update equation: A=A+X. In a conventional machine this could be coded as:
load A PA0 add X PA0 store A. PA0 load A PA0 add X PA0 store B PA0 PTS PA0 load B PA0 store A
If interference occurs on the last statement (store A) the variable A can be scrambled, in which case all record of A is lost and the system cannot recover. A circumvention "hard" program for the above examples becomes:
where PTS (Program Triple Store) establishes a roll-back point.
The instruction, Program Triple Store, (PTS), refers to a special instruction that results in the storing of the program count, the contents of a status register and the contents of an index register, in identical form in three distinct memory locations in the main memory unit. Writing such information sequentially into three separate locations in the memory unit, in identical form, permits recovery subsequent to circumvention of a disrupting event by means of an algorithm. Such an algorithm results in comparisons of the contents of one location with the contents of the other two, until it becomes evident from such comparisons which one or more of the three special storage locations contains assured valid contents. Valid contents refer to valid program count status register contents and index register contents which could not have been affected by the disrupting event and which provide a means for recovery to resume normal operations subsequent to such an event.
If the circumvention occurs on the store B statement, the program just recomputes B. If the circumvention occurs on the store A statement, the program recovers the value to be stored in A from location B. One problem with the software approach of circumvention is the difficulty of ascertaining that the final code is in fact "hard". It is desirable to limit the number of PTS statements because they represent overhead in execution time and memory. However, limiting the number of roll-back points opens the program to numerous subtle failure modes involving branches to subroutines, interrupts, and multiple circumvention.
In practice, software program circumvention involves coding the basic program, making the code "hard" by the addition of roll-back points, and then optimizing the code to minimize the overhead for roll-back points. When all this is done it is then usually necessary to verify that the program is "hard". Such verification ordinarily requires a large amount of manual intervention for test definition and evaluation.
There is also a throughput penalty associated with the use of PTS instructions beyond the direct penalty resulting from the use of the added PTS instructions. For example, from the above latter sequence of instructions it is seen that the store B and load B instructions are overhead since they do not appear in the original code. Accordingly, the software technique for establishing a roll-back point by means of Program Triple Store increases software costs while decreasing data throughput performance.
Another possible software technique for establishing rollback points is the use of Image Store Instructions (IMAST). An IMAST instruction copies the contents of the register file of the central processing unit and one other word containing the program counter and status bits in alternate storage blocks in the main memory unit in conjunction with a hard pointer PTR.
When an IMAST instruction is executed, the central processing unit image (contents of file registers, program counter and status register) is written into the block indicated by the hard pointer PTR. When all the contents of the file registers, program counter and status register are written sequentially into one of the two blocks, the hardened pointer PTR is changed to point to the other block storage location. Recovery is accomplished subsequent to an environmental interference by loading the registers of the central processing unit, CPU, from the block not pointed to by the pointer PTR.
If circumvention takes place while the computer is responding to an IMAST instruction by storing its CPU image into image B, then the recovery routine uses image A to restore the CPU registers. If circumvention occurs while the pointer PTR is being changed, then the status of the pointer PTR will be indeterminate. However, it would make no difference whether the pointer PTR points to image A or to image B since both image blocks would contain a consistent image and would, thus, lead the central processing unit, CPU, and computer to a successful recovery.
The IMAST technique for establishing roll-back points also requires a programmer to insert a special instruction to create each roll-back point.