The present invention relates generally to computer systems, and more specifically to a high performance pipeline employing a write queue to improve processor throughput.
A multiple-address per instruction computer is one that accesses more than one memory location during the execution of a single machine instruction. A single-address instruction accesses only one memory location during execution. A read from and write to the same physical or logical address counts as two memory accesses. Thus, a single-address instruction can read from or write to a memory location, but not both. A two-address instruction can, for example, read from a memory location, increment the value, and write the new value back to the same address. A three-address instruction can, for example, read the values in two memory locations, add them, and write the results into a third memory location. Both memory reads can be accomplished at the same time by using a dual-port memory. However, reads and writes cannot be done simultaneously. Such multiple-address instructions are common in many computer architectures, including LISP machines such as the CADR and Texas Instruments Explorer (TM).
The pipelines presently in use are typified by the CADR pipeline illustrated in FIG. 1. This is a basic two-deep control pipeline with a third stage consisting of an invisible write. The control pipeline consists of the first two stages, shown as, for example, clock (CK) cycles one and two for the first instruction (I1).
The data pipeline consists of stages two and three. After the instruction has been fetched (in the previous cycle), the memory location, or two memory locations if a dual port memory is used with a three-address instruction, are read in during the first half of the second stage. The instruction is executed (EXE) during the second half of the second stage, with the result written to memory during the second half of the third stage. Since the memory write operation cannot take place during the memory read of the following instruction, the data to be written must be temporarily held in a latch and written to memory during the execution half-cycle of the following instruction.
FIG. 1 shows the complete execution of four multiple-address instructions, I1-I4, showing the instruction overlap as it moves through the pipeline. I1 is fetched from the instruction memory during the first clock cycle, with memory reads and instruction execution taking place during the second cycle. I2 is fetched during the second cycle, with the read and execute portions of I2 occurring during clock cycle 3. The I1 write takes place during the execution portion of I2, which occurs in the second half of the third clock cycle. A similar relationship between the instructions occurs with instructions I3 and I4. It is apparent from a review of FIG. 1 that the instruction memory need only be accessed once per cycle, while the data memory must be read from and written to during a single cycle. The ALU of the computer, which operates only during the EXE portion of each instruction, must operate in one half of a clock cycle, and is idle the other half. As can be seen from FIG. 1, the length of time that it takes to execute an instruction, or a microinstruction in the case of a microcoded machine, is at least the sum of the times required for a memory read and an ALU operation.
It will be apparent that there are several idle periods during the processing of a single instruction. Assuming that the instruction memory is the same speed as the data memory, an instruction fetch can be accomplished in approximately one-half of a clock cycle. The data memory is generally fully occupied, as reads are performed in the first half of each clock cycle, with writes being performed in the second half. However, the ALU is idle 50% of the time, inasmuch as it is only used in the EXE portion of the instruction.
It would be desirable for an instruction pipeline to utilize the ALU fully, in order to increase processor throughput. Since all processor activities can be performed individually within one-half clock cycle, 100% ALU utilization would allow the clock frequency to be doubled, thereby doubling the instruction execution rate of the machine.
Therefore, in order to provide a system which more nearly fully utilizes the various portions of the system, and provides increased throughput, a computer system comprises an instruction pipeline having separate fetch, memory read, instruction execute, and memory write stages. A write queue having multiple locations is provided, and the results of multiple-address instructions are written to the queue. When the data memory is not utilized by the read stage of a following instruction, a value stored in the queue is written to memory. Memory read addresses are compared to the addresses of numbers stored in the write queue so that the most recent value, which is stored in the write queue, will be read by the system.
The novel features which characterize the present invention are defined by the appended claims. The foregoing and other objects and advantages of the invention will hereinafter appear, and for purposes of illustration, but not limitation, preferred embodiment is shown in the drawings.