Digital data processing systems typically include a data processor having a characteristic logical address space, a limited amount of primary memory directly accessible within a physical address space, a much larger amount of secondary memory accessible only with the help of one or more peripheral controllers, and any of a number of customary input/output devices. In systems which include a data processor having a particularly large logical address space, the user may decide that his application is so time critical as to justify providing an equivalent amount of relatively expensive primary memory. More often, however, the user will choose to use these funds to provide a much larger amount of the less expensive secondary memory, and accept the time penalty associated with swapping portions of his programs/data between the primary and secondary memories as they are required by the processor. In general, the efficiency of the swapping operations depended upon the judicious segmentation of the application programs by a talented programmer into a series of interrelated, but somewhat autonomous overlays. To somewhat alleviate the problem of finding or developing such experienced programmers and the expense inherent in perfecting large segmented programs, supervisor programs were developed which allowed each application program to pretend that it had direct access to the full logical address space of the processor regardless of whether the corresponding physical address space was presently assigned to the program or even actually present in primary memory! Such "virtual memory" supervisor programs typically relied upon associative memory mapping hardware to detect accesses by the currently executing program outside the boundaries of the portion(s) of the physical address space assigned to the program. In response to such "faults", the processor would store some necessary state information before branching to a fault handling portion of the supervisor program which recognizes the "virtual" access and, if appropriate, loads the required program code/data from secondary memory into primary memory. If desired, the supervisor can move some of the program code/data from the primary memory to the secondary memory to make room for the new code/data. Typically, the supervisor program would then reexecute the particular instruction which the processor was executing when the fault occurred. Just how much information had to be stacked off and the mechanism employed by the supervisor program to prepare the processor to reexecute the "faulted" instruction varied from machine to machine.
In some designs, the processor simply stored the contents of the various user registers, the instruction register, the program counter and the current status information, just as if an interrupt had occurred. The supervisor program had to "back up" the program counter, if necessary, to find out what instruction the processor had been executing, and then to reconfigure the registers and status bits to approximate as close as possible the state of the processor when the faulted instruction was originally started. Even in systems where the processor instruction set was relatively regular and predictable, the burden placed on the supervisor program was far from insubstantial. In more complex systems, this approach was often impossible to implement.
When the burden on software became insurmountable, additional hardware was added to keep track of the instruction execution sequence by "marking" the completion of each step in the sequence. When a fault occurred, the mark information was stacked together with the register and status information. The supervisor program still had to determine which instruction the processor was executing at the time of the fault, and later instruct the hardware to reexecute that instruction. Now, however, the supervisor program could supply the "old" mark information to the hardware. As the hardware proceded through each step in the execution sequence, marking its progress as always, additional control circuitry would compare the "current" mark information with the "old" mark information. If the control circuitry determined that a particular step had already been performed before the fault occurred, it would suppress only the consequences of that step, and then allow the execution sequence to continue. Once the "current" and "old" mark information coincided, indicating that the processor had reached the step where the fault had occurred, the control circuitry ceased interfering in the actual performance of the succeeding steps in the execution sequence. In this manner, the burden of restarting a faulted instruction was shared between the software and the hardware. Of course, it was still the responsibility of the supervisor program to fix the underlying cause of the fault before attempting to restart the faulted instruction.
There is no inherent limitation in the virtual memory concept which restricts its use to single processor systems. In fact, multi-processor systems have been proposed where a fault encountered by one processor generates an interrupt to a parallel processor. Upon responding to the interrupt, the latter processor will attempt to fix the problem which caused the other processor's fault. Meanwhile, the faulted processor is simply kept waiting for the fault to be resolved. If and when the fault is successfully resolved by the other processor, the faulted processor goes on its way without ever being aware that the access fault occurred. Note that the supervisor program of the processor which assumes the task of fixing the faults requires no information on the instruction being executed by the faulted processor. It will however have to have access to the specifics of the logical address which was faulted, and some information about the address space of the program which encountered the fault. Such information can be easily latched during the course of each bus cycle so that it will be available when a fault occurs. Besides requiring at least two processors and additional latch and interrupt generation hardware, this virtual memory technique forces the faulted processor to wait until the other processor has corrected the fault, thus tieing up both processors during each fault resolution.
In multiprocessing systems, it is generally desirable that any processor in the system be able to execute any program awaiting execution. This could include resuming execution of a program which has been temporarily suspended because of an interrupt or time sharing constraints. As long as the several processors have the same instruction set, there is no hardware limitation which prevents such an arrangement. A problem arises when this technique is extended to include resuming execution of a program which has been suspended due to a fault condition in the course of executing an instruction. In order to properly resume execution of such a suspended program, the processor attempting to do so must execute the same instruction set in the same sequence as the processor which was originally executing the program. Otherwise, there is no assurance that the faulted instruction will be properly completed. While the supervisor of each processor can attempt to detect such incompatibilities, the same supervisor program may be simultaneously executing on several processors and must therefore rely upon the integrity of a memory based, resource data base for information on processor characteristics. In such software controlled systems, a substantial risk still exists that an incompatible processor resumption of a faulted program will go undetected.