The present invention relates to processors that have two or more instruction execution circuits.
A computer processor typically has different instruction execution circuits for executing different instructions. For example, floating point division and reciprocal square root instructions can be executed by one circuit, and multiplication instructions by another circuit.
The results of instructions executed by different circuits are written to some destination, for example, to a memory or a register file. The results from different circuits may become available in the same clock cycle for being written to their respective destinations. Therefore, some processors provide a separate write port for each circuit so that all the circuits can write their results to the destinations in the same clock cycle.
However, providing multiple write ports for a memory or a register file reduces the memory or register file speed and increases their size and complexity. Therefore, sometimes separate write ports are not provided, but instead a write port is shared by different instruction execution circuits. When one circuit writes its result, the other circuit is stalled. For example, if circuits C1 and C2 have results available for writing in the same clock cycle, circuit C1 writes its result, and circuit C2 is stalled. Then circuit C2 writes its result, and circuit C1 is stalled. Then circuit C1 writes its next result again, and circuit C2 is stalled. Thus, as long as both circuits have results available for writing, they use the write port in a round-robin fashion.
It is desirable to improve the write port sharing techniques to increase the processor throughput.