The present invention relates to an information or data processor that processes and executes instructions on a pipeline basis. More particularly, the invention relates to an information or data processor (simply called an information processor hereunder) for processing information or data in an environment where short and long latency processes are mixed.
The biggest constraint on instruction control by an information processor that processes instructions on a pipeline basis as programmed is that the processor must execute the instructions in a sequence consistent with the software-designated instruction execution order.
Typical pipeline-processing information processors perform what may be termed basic pipeline processing having each instruction completed in about three to five cycles. In this type of processing, instruction execution may be carried out one cycle at a time (called short latency processing hereunder).
The instructions are not limited to short latency processing alone; some instructions are complicated and take some time to execute (called long latency processing hereunder). The complicated or time-consuming instructions illustratively represent divisions and main memory access operations. This makes it difficult to meet the above consistency requirement at all times.
There have been proposed ways to bypass the above constraint regardless of the duration of pipeline processing being short or long, i.e., irrespective of short or long latency processes being carried out.
The simplest way to satisfy the requirement of instruction execution sequence consistency is that, when an instruction performs a process other than a basic pipeline process, all subsequent instructions are kept from proceeding to execution (the scheme is called the interlock method).
The following series of instructions will be discussed below as an example: EQU FDIV r6, r7, r8 (1) EQU FADD r1, r2, r3 (2) EQU FSUB r3, r4, r5 (3)
If the instruction (1) turns out to have a long execution cycle (i.e., a long latency process instruction) when executed, then the interlock method causes the instruction (2) and all subsequent instructions to be interlocked.
Alternatively, if the instructions (2) and (3) are independent of the outcome of the instruction (1), the two instructions may be executed ahead of the latter to enhance execution performance. This method involves using a detector that checks during execution of the instruction (1) to see if the subsequent instructions are dependent on the result of the execution of the instruction (1). If the subsequent instructions are found to be independent of the ongoing instruction, these instructions are executed without delay. If the subsequent instructions are found to be dependent on the outcome of the ongoing instruction, processing is allowed to proceed up to the currently executed instruction, the dependent instructions are interlocked therewith. These methods are implemented so as to process instructions in a consistent sequence.
Of particular importance regarding instruction dependency is a possible conflict that may occur between general registers. There are two representative ways to detect register conflict. One way is to compare the number of the register holding the instruction for each pipeline with the numbers of the registers accommodating instructions to be subjected to pipeline processing. The other way to detect possible register conflict is the use of what is known as a scoreboard.
A scoreboard comprises a bit indicating the number of the register accommodating the instruction currently treated in pipeline processing; setting means for setting the bit to 1; and resetting means for resetting the bit to 0. A register conflict is detected by the scoreboard checking to see whether the bit corresponding to the register for an instruction to be put to pipeline processing is 1 or 0.
The method for interlocking instructions upon comparison of register numbers, i.e., a first conventional method, is discussed illustratively in Japanese Patent Laid-Open No. Hei 5-298091. The publication discloses an information processor based on this method whereby, with a load instruction waiting for data to arrive from memory, arithmetic instructions subsequent to the load instruction are executed before termination of the ongoing instruction if no register conflict exists between the load instruction and the subsequent instructions.
The information processor according to the first conventional method includes a register for holding a load instruction as long as that instruction is being processed. The number of that register holding the load instruction is compared by a comparator with those registers accommodating instructions to be subjected to pipeline processing. The comparison reveals whether any register conflict can occur.
The scoreboard-based method, i.e., a second conventional method, is disclosed illustratively in Japanese Patent Laid-Open No. Hei 5-108348. In the case of a cache miss with a load instruction (a long latency process), this method causes subsequent arithmetic instructions to be carried out concurrently so as to minimize the unnecessary waiting time of the latter instructions. According to the second conventional method, a load instruction is entered into the scoreboard when its pipeline processing has started and is removed from the scoreboard when the instruction execution has ended. The bit representing the register which accommodates the instruction currently handled in pipeline processing is set to 1. The bit settings are checked to detect a register conflict between the load instruction and the subsequent instructions.
A main memory access operation, which is a typical long latency process, will now be described.
The main memory access operation typically takes about 400 ns to complete. If one machine cycle is 10 ns, the latency involved amounts to 40 cycles. This poses a bottleneck in terms of computer (i.e., information processor) performance.
There have been proposed methods for speeding up the main memory access operation. They primarily involve subjecting the access operation to pipeline processing.
One such method, i.e., a third conventional method, is a memory access pipeline processing method that uses a memory arrangement made up of a plurality of banks interleaved in units of words. The memory arrangement constitutes a so-called interleaving memory designed for better performance of the main memory. The method is discussed illustratively by John L. Hennessy and David A. Patterson in "Computer Architecture," Chapter 8, "Designing Memory Hierarchy."
The purpose of setting up a plurality of memory banks is twofold: to permit continuous access operations and to allow a plurality of independent access operations.
However, access operations to independent memory banks can result in a bank conflict. That is, while a given memory bank is being accessed, there may arrive a request to access the same bank. The resulting bank conflict causes subsequent access requests to wait in an input buffer.
Illustratively, suppose that there are access requests 1, 2, 3 and 4 and that access requests 1 and 2 lead to a bank conflict. In such a case, the access requests are serviced and finished in the order of requests 1, 3, 4 and 2.
In other words, the output of the interleaving memory does not maintain the original sequence of processing requests. When finished, the requests may have changed their sequence. Furthermore, processing cycles may be varied.
Thus, the information processor having an interleaving memory (based on the third conventional method) is required to have suitable arrangements to make sure that the sequence of instruction execution requests from an instruction processing unit matches the sequence of instruction execution responses from a memory controller. Specifically, the memory controller has an output buffer in which to retain access requests 3 and 4 beforehand so that after execution of access request 2, access requests 3 and 4 are sent in that order to the instruction processing unit.
One disadvantage of the first conventional method, i.e., a first disadvantage of the prior art, is that with the instruction register filled with the ongoing instruction, an incoming load instruction (a long latency process) causes all subsequent instructions to be interlocked (in a hold register conflict). This makes it impossible to continue pipeline processing. In other words, pipeline processing is disabled depending on the status of the instruction-holding register numbers.
Another disadvantage of the first conventional method, i.e., a second disadvantage of the prior art, is that where a plurality of registers are used to hold numerous instructions for pipeline processing, a plurality of comparators are needed to compare the multiple register numbers which were given to each register. The result is a considerable enlargement of the scale of the necessary hardware.
One disadvantage of the second conventional method, i.e., a third disadvantage of the prior art, is the need for complicated control logic for consistent pipeline processing control. Since applicable entries are made into the scoreboard as soon as instructions are input to the pipeline, the scoreboard status must be controlled through monitoring of each pipeline process, especially when a branch instruction or interrupt handling disrupts pipeline processing and nullifies the load instruction.
One disadvantage of the third conventional method, i.e., a fourth disadvantage of the prior art, is the presence of an excessive overhead. With an interleaving memory in use, the information processor based on the third conventional method still retains access requests 3 and 4 in its output buffer although the servicing of access requests 3 and 4 has been completed. This leads to a register conflict in the instruction processing unit between access request 3 or 4 and a subsequent instruction subjected to pipeline processing. In that case, an increased overhead results from the excess interlock with the cycles needed for the memory controller to ensure the proper sequence of instruction execution.