This invention relates to binary data processors which employ two or more parallel pipelines, and, more specifically, to execution stages thereof which allow for parallel execution of data dependent instructions.
A multi-staged pipeline is commonly used in a single integrated circuit chip microprocessor to process programmed instructions by advancing them one after the other through its serially connected pipeline stages. That is, a different step of the processing of an instruction is accomplished at each stage of the pipeline. For example, one important stage generates from the instruction and other data to which the instruction points, such as data stored in registers on the same chip, an address of the location in memory where an operand is stored that needs to be retrieved for processing. A next stage of the pipeline typically reads the memory at that address in order to fetch the operand and make it available for use within the pipeline. A subsequent stage typically executes the instruction with the operand and any other data pointed to by the instruction. The execution stage includes an arithmetic logic unit (ALU) that uses the operand and other data to perform a calculation, such as addition, subtraction, multiplication, or division, or a logical combination that is specified by the instruction. The result is then, in a further stage, written back into either the memory or into one of the registers. As one instruction is moved along the pipeline, another is right behind it so that, in effect, a number of instructions equal to the number of stages in the pipeline are optimally being simultaneously processed.
More recently, two parallel pipelines have being used. Two instructions may potentially be processed in parallel as they move along the two pipelines. When some interdependency exists between two successive instructions, however, they often cannot be started along the two pipelines at the same time. One such interdependency is where the second instruction requires for its execution the result of the execution of the first instruction. For example, one instruction can call for an operand retrieved from memory to be added to an operand in a register, with the result written back to the same location in memory. The next instruction could then call for a third operand to be subtracted from that result, requiring that same memory location to again be accessed and its data read as part of processing the second instruction. The second instruction must then be held from moving along the stages of the second pipeline until the first instruction has been executed by the first pipeline and the result stored in memory. Only then is one operand required by the second instruction available for retrieval. This obviously slows down the throughput of the processor by not using the parallelism that is provided by the two pipelines.
To overcome this disadvantage, two instructions having a certain types of data dependency have been suggested to be executed simultaneously in a single ALU that has a third input port. In the example of the preceding paragraph, all three operands necessary to execute both instructions would be inputted to the enlarged ALU at one time. Its data output then provides the result of execution of the two instructions. There is then no need to store the intermediate result of the execution of the first instruction. Indeed, this intermediate result is not even calculated. The parallelism provided by two pipelines is then fully utilized to process two successive data dependent instructions.
However, the carry bit output of the enlarged ALU is not usually correct for its data output. Therefore, separate logic is usually provided to determine the carry bit, with a disadvantage of utilizing more space on the integrated circuit and consuming more power. Therefore, it is a primary object of the present invention to provide a technique and circuit implementations thereof that provide the value of such a carry bit with the utilization of fewer components.
It is another object of the present invention to provide an improved technique for determining the value of a carry bit for data resulting from simultaneously executing two arithmetically data dependent instructions.
It is a more general object of the present invention to improve and simplify the simultaneous execution of data dependent instructions in a processor.