This invention relates to digital computers having cache memories in which information is exchanged between a main memory and a high speed processor via a high speed cache. A cache memory is a small fast memory disposed between a slower main memory and a high speed processor. In the past, cache memories have been used to store frequently used portions of main memory for high speed access by an associated high speed processor. The cache memory has previously been intended only to contain an unmodified subset of the contents of a main memory. In other words the contents of the main memory and the cache memory have differed, if at all, only temporarily due to write operations performed by the processor to the cache memory. However, the operations are either immediately or eventually reflected in write operations to the main memory.
High performance processors such as parallel processors and reduced instruction set computer (RISC) processors have as a purpose the most rapid and efficient execution of predefined instructions. An instruction is a form of digital information defining an operation and the operands performed by a processor. The execution of an instruction is the carrying out of the operation by the processor. Decoding of an instruction involves the determining from bits of information defining the instruction which operation is to be performed on which operands. Decoding of an instruction is required to produce control bits which are the values provided to control points of the processor.
A finite amount of time is required to decode an instruction prior to the execution of instruction. In the case of branch instructions, a finite amount of time is required to compute the address of the target instruction. In the past, it has been necessary that the instruction cycle be sufficiently long to examine all the bits of an instruction in order to determine which operation is to be performed and which instruction is to be fetched next. Low performance computers, that is, computers in which all stages of instruction processing are performed sequentially with no overlap of stages between instructions, must be provided sufficient latency time to perform all portions of the instruction including the decoding of the instruction. In higher performance computers, such as a computer using a pipelined processor wherein stages of an instruction cycle may overlap, stages of an instruction which rely on the completion of the stage of a previous instruction must be delayed. This may result in interlocking the pipe and loss of usefulness of one or more machine cycles. What is needed is a technique for overcoming such an inefficiency.