The present invention involves electronic data processing, and more specifically concerns apparatus and methods for enabling a vector processor to employ virtual addressing while overlapping the execution of multiple instructions in a single instruction stream.
Many very high performance supercomputers employ vector processing, sometimes called SIMD (single instruction, multiple data) architecture. Vector processing processes data having many elements in parallel with a small instruction overhead. It is particularly advantageous in a RISC (reduced instruction set computer) environment, because multiple copies of the RISC general-purpose registers, sometimes called architectural or architected registers, are easily integrated into a chip.
On the other hand, vector-processing systems have heretofore been confined essentially to the use of real addresses for operands. Almost every other processor now employs virtual addressing, in which a virtual address in a program is translated at run time into a different real address for accessing memory. Translation occurs on the fly, using a translation table in a high-speed lookaside buffer to store frequently used blocks of addresses, so that the translator rarely needs to access the actual translation table from main memory.
Although the virtual-address hit rate is high, the penalty for a page fault is much higher in a vector processor than it is in a scalar architecture. Although loading a new page is time-consuming, the occurrence of a page fault in a scalar processor can be detected a short time after commencing a load or store operation. Whether or not a vector load or store will produce a page fault, however, requires far more time. In a processor that operates on 64 vector elements simultaneously, a page fault can occur upon loading or storing any of the 64 elements, because the elements can reside in different memory pages. This would require a vector processor with virtual addressing to stop executing other instructions while 64 elements are processed for every memory-reference instruction. For this reason, the overhead of using virtual addressing in a vector-processing architecture cannot be countenanced. In a pure RISC system, all operations take place within a relatively small set (from 16 to 256 or more) of architectural general-purpose registers, except for a very small number of memory-reference instructions which do nothing more than load an operand from memory to a register or store a register's contents to memory.
Some processors overlap the execution of multiple instructions for higher speed. However, if any of the overlapped instructions is a memory-reference instruction, the machine must roll back all subsequent instructions if a page fault occurs during the loading or storing of any of the vector elements. It can do this by saving the machine state, including the contents of all registers, updating the lookaside buffer with the new page addresses, successfully reloading all the vector elements, and then redoing the subsequent instructions. However, if one of the speculatively executed overlapped instruction has changed the contents of a register needed for the reexecution of another of the rolled back instructions, that instruction cannot be redone properly. The straightforward solution is to prohibit overlapped instructions following any memory-reference instruction. However, overlapping would then produce little or no benefit.