Pipelined processor data paths often stage (temporarily store) results of information processing for some number N of cycles before “retiring” to an architectural register file.
Traditionally, this “staging” is implemented via a series i=1 . . . N of storage elements (hereafter referred to as a “bypass register file”). Every cycle, the data from storage element number N may be written (retired) to the architectural register file, data from other storage elements i is copied to storage element (i+1), and a new result (if any) is written into storage element number 1, (e.g., results in the bypass register file are physically shifted through the storage elements). Effectively, each result is therefore copied N times before retiring, which increases power usage.
The processor's functional units must be able to read the most recent value of a register from either the architectural register file, or from any of the locations in the bypass register file. This requires selecting among N+1 locations, which requires a large number of wires and wire tracks in the processor's core.