Register files or, simply, registers are well known small, fast local storage arrays. A typical n by m register file includes storage latches in n rows and is m wide, e.g., a single byte, word or multi-word. Register files include, for example, first in first out (FIFO) or serial shift registers and first in last out (FILO) or push/pop registers. A FIFO may be a circulating shift register, for example, or a multi-port register with at least one input port and at least one output port. Additionally, typical such multi-port registers may be used for improving processor performance, e.g., in processor data queues or as pipeline registers.
In a state of the art pipeline structure, synchronous logic is segmented with a pipeline between segments or stages. So, in a pipeline processor, for example, a processor clock clocks pipeline registers distributed at strategic locations throughout the processor logic. Ideally, data latched in one pipeline stage propagates to, and arrives at, the next stage just as it is clocked into that next stage. So, pipeline registers act as boundaries between data units traversing the pipeline stages. Thus, for an N segment pipeline, N data units may be traversing the pipeline with one data unit in each segment. Also ideally, the logic delay through the N stages is N clock periods, i.e., the time each data unit spends in the pipeline is no more than necessary to propagate through the logic. So, ideal registers do not add path delay that detracts from overall performance.
In practice however, registers add to path delay, regardless of the register type (FIFO or FILO) or its use, e.g., whether as local storage or as a pipeline boundary. Consequently, for a pipeline circuit for example, the clock period limits the depth of the logic between pipeline registers to less than the clock cycle for any given clock frequency. Instead, the propagation delay between registers is offset or reduced by the register delay, where the register delay is the time through the registers, i.e., the time in and out of a register. So, the register delays reduce the time available for logic for each stage.
Further, the register delay is additive because it is encountered at each stage. For a pipeline circuit with 10 pipeline stages, for example, the 10 additional register delays may add one or more clock cycles to the time each data unit requires to traverse the pipeline, which is also known as the latency. Typically designers reduce the logic between stages with a corresponding increase in the overall number of stages to accommodate for these register delays. Each additional stage increases the circuit complexity without adding to the chip function; while it consumes valuable circuit area or real estate and so, reduces logic density. Further, each additional stage increases chip power, again without adding to the function and so, reduces chip efficiency. Of course, these problems dissipate as the register delays are reduced relative to other path logic.
Thus, there is a need for improved register performance.