Many modern data processing systems utilize multiple levels of storage of data and instructions. Cache memories are small high-speed memories located intermediate the larger, slower main memory and the central processing unit (CPU). Cache memories are loaded from the main memory, and if the data required by the CPU is resident in the cache, very rapid access thereto is available. If the particular required information is not resident in the cache, it must be located in the main memory and transferred, with other likely-to-be-needed information, to the cache.
Virtual memory systems may be similar in concept, but are generally employed for other purposes. In a virtual memory system a main memory is provided, and there may or may not be a cache associated with the memory to CPU access path. Because of the finite amount of memory which can be directly addressed by a CPU in a typical virtual system, an extremely large additional memory, frequently in the form of a magnetic disk memory, may be required to accommodate a very large program or data base.
Similarly as in the case of the cache memory, data is loaded from the disk into main memory such that the data is quickly accessible to the CPU. If the required data is not resident in main memory, a disk access is made to move a block of data into the main memory.
Depending upon the specific architecture of the data processing system, and specifically the length of each instruction to be executed by the data processing system, when a block of data is transferred either from the disk of the virtual machine into the main memory thereof or from the main memory to the cache of a cache-based machine, a particular instruction may extend across the boundary of the block of data transferred such that only a portion of the instruction actually becomes resident in the cache or in the main memory. For example, if the data word size of a particular processing system is sixteen bits, and various instruction types may have various instruction lengths from 16 bits to, for example, 48 bits or three words, a particular transfer of a block of data from the disk of a virtual machine into main memory may incompletely transfer a 48 bit long or a 32 bit long instruction. Thus when the CPU attempts to execute this incomplete instruction, an error condition will be detected, forcing the machine to execute exception handling routines to recover from that error.
In some central processing system organizations, entire instructions are loaded into the instruction execution unit or the instruction decoder and the invalidity of the instruction can be readily identified. In a CPU which employs a pipelined instruction accessing system, however, the incomplete transfer may not always be immediately detectable upon fetching of the instruction. In such a pipelined system, while an instruction is currently being executed by the instruction execution unit of the CPU, the next instruction to be executed is concurrently being fetched from main memory. In the case where the data path between the main memory and the instruction execution unit is not as wide as the widest possible instruction (i.e. a 16-bit data path and a 48-bit instruction) the first 16 bit word of an incomplete instruction may be fetched and placed in the top of the pipeline, subsequently be advanced in the pipeline while the second 16 bit word of the incomplete instruction is fetched, and finally execution may be begun on the first 16 bit word of the instruction at the time a fetch is attempted for the missing third 16 bit word of the instruction.
One possible solution to this problem of avoiding the beginning of execution of words of incomplete instructions is to identify, upon the fetch of the first 16 bit word, the size of the instruction being fetched. If the instruction is a multiple-word length instruction, a fetch is immediately made of all required words to insure that they are resident in the memory. In such a system, a fault or error condition is generated immediately if the instruction is incomplete. In some instances however it is desirable to defer the taking of the fault until the first word of the invalid instruction is actually ready to execute rather than at the time it is being loaded into the head of the pipeline. For example, if the immediately preceding instruction (i.e. the instruction now under execution) causes a branch to another portion of the program, the invalid instruction being concurrently fetched may well not be required in this sequence of operations since the program well branch to a different instruction stream. Thus it is desirable to defer the commencement of the fault mechanism for as long as possible, which is usually to the point where the invalid instruction is prepared to execute or has just begun execution.
Accordingly, it is an object of the instant invention to provide a prefetch validation mechanism and a method for validating prefetched instructions such that in the event of an invalid instruction, the instruction execution unit executes a fault at the latest possible time during the execution of the instruction sequence.
It is a further object of this invention to provide a more efficient instruction execution technique in a pipelined CPU.