Pipelined processor data paths often stage (temporarily store) results of information processing for some number N of cycles before “retiring” to an architectural register file.
Traditionally, this “staging” is implemented via a series i=1 . . . N of storage elements (hereafter referred to as a “bypass register file”). Every cycle, the data from storage element number N may be written (retired) to the architectural register file, data from all other storage elements i is copied to storage element (i+1), and a new result (if any) is written into storage element number 1. Each result is therefore copied N times before retiring, and on every successive cycle, a given result will reside in a different physical location.
When the processing element wishes to read the latest value of an architectural register, control logic needs to determine whether the most recent result for that register resides in the architectural register file or the bypass register file and, if the latter, in which of the N stages of the bypass register file. The control logic then generates control signals to cause the data path to deliver (“forward”) the most recent copy of the desired register to the processing element.
A traditional implementation of the control logic keeps a list of the register specifiers for all currently in-flight register writes, compares (via associative lookup) the register specifier for the new read request against all writes, and picks (via a prioritizer circuit) the most recent one for forwarding. This traditional implementation is power, area and wiring intensive, and does not scale well as the pipeline length and number of functional units in the processor increase.