Some embodiments of the present invention are generally related to microprocessors, and more particularly, to register files.
A register system is a key component of a microprocessor. The register system should be responsive and able to deliver data quickly, yet be large enough to support a high level of instruction level parallelism (ILP).
Register file accesses can often require multiple cycles of latency, because of the manner in which they are addressed. Typically, register files are accessed through address decoding logic, or “ports”, that can be costly in terms of die area and power consumption. Furthermore, microprocessor designers may include additional storage structures in a microprocessor datapath, such as a register cache, which can typically be accessed faster than the register file, due to its size. Accordingly, data storage structures, such as, register caches, can be used to supplement the storage space and performance needs of some prior art microprocessor architectures.
Because data writes can also require multiple processor cycles to complete, data to be written to the register file is often stored in a memory buffer, known as a writeback queue, after they have been issued from the processor core logic. Accordingly, data can be temporarily stored in the writeback queue until it can be stored in the register file (assuming a deep enough queue).
Similarly, some prior art datapaths can use a bypass cache temporarily before the data is returned to the processing functional elements. Bypass cache and associated logic can be used in prior art processor datapaths for data that is to be immediately reused by subsequent operations after being generated by the processor core logic, instead of, or in addition to, storing this data in the register file. Typically, bypass caches return data to the functional units of a processor, such as the execution units, directly, whereas writeback queues return data to the register file of the datapath, which can be accessed by the functional units.
FIG. 1 illustrates a portion of a prior art processor datapath in which a micro-operation (uop) windowing mechanism (“non-data capture window”) provides uops to the processor core logic functional units for execution. The executed uops may access data via a register file and register cache structure. Specifically, data to be used by uops executed by the functional units is stored in the register file, to the extent bandwidth and space are available, and then to the reservation station for use by the functional units. Alternatively, data can be accessed from the register cache if it is available there, which is typically faster than accessing the data from the register file.
Data stored in the register cache can be accessed by the functional units directly. Typically, the register cache contains a copy of the data stored in the register file.
Data returned by the functional units to the register file may be temporarily stored in the writeback queue or bypass cache until the data is needed by the functional units (in the case of a bypass cache) or until bandwidth/space is available in the register file (in the case of the writeback queue). If space or bandwidth is not available in the register file, the processor will stall until the register file is available, thereby incurring processor performance penalties.
Bypass caches and writeback queues can be costly in terms of die area and power consumption, however. Furthermore, as microprocessors increase in operand size and speed, so does the demand on the register file. In order to keep up with the demand of processor performance, register files and/or their associated register caches must expand, thereby incurring power and die area penalties. Accordingly, designers are often faced with having to sacrifice power and die area for more register file performance.