The present disclosure is generally directed to techniques for increasing instruction issue rate and reducing latency and, more specifically, to techniques for increasing instruction issue rate and reducing latency for an out-of-order processor by instruction chaining and collision avoidance.
Modern processors are often operated by using an out-of-order (OoO) sequencing of instructions in order to improve the performance of the processor. Since in modern multithreaded processors instructions with different execution latencies are issued out of sequence of an instruction queue and results are written back into a register file with limited writeback ports, the issue sequence must reflect writeback resource constraints. In particular, if an instruction is issued for execution and will complete after ‘n’ execution cycles and another instruction, which may complete in ‘m’ (<‘n’) cycles, would be issued m-n cycles later, both instructions would writeback into the register file concurrently and, thus, collide.
A first conventional approach has delayed issuance of the short latency instruction if another instruction with ‘k’ cycle latency, wherein ‘k’ is not equal to ‘m’, is ready for issuance. A second conventional approach has treated the short latency instruction as an instruction with ‘n’ cycle latency to eliminate the resource writeback conflict. A disadvantage of the first conventional approach is that any older short instructions cannot be issued since longer latency instructions will get a higher priority resulting in a stall condition. A disadvantage of the second conventional approach is that back-to-back latency is increased which might cause reduced throughput and reduce power efficiency.
U.S. Pat. No. 7,478,225 discloses an apparatus and method to support pipelining of variable latency instructions in a multithreaded processor. In one embodiment, a processor may include instruction fetch logic configured to issue a first and second instruction from different ones of a plurality of threads during successive cycles. The processor may also include first and second execution units respectively configured to execute shorter latency and longer latency instructions and to respectively write shorter latency or longer latency instruction results to a result write port during a first or second writeback stage. The first writeback stage may occur a fewer number of cycles after instruction issue than the second writeback stage. The instruction fetch logic may be further configured to guarantee a result write port access by the second execution unit during the second writeback stage by preventing the shorter latency instruction from issuing during a cycle for which the first writeback stage collides with the second writeback stage.
U.S. Patent Application Publication No. 2011/0087866 discloses a multi-threaded microprocessor. The multi-threaded microprocessor includes an instruction fetch unit including a perception-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages, each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. Thus, later instructions may execute before a given instruction completes, which may improve overall performance of the executing thread. Unfortunately, the problem of conflicting parallel executed instructions that need to writeback to a register file is not overcome without delaying instructions in one way or the other, degrading overall performance of a processor.