1. Field of the Invention
The present invention relates to processor architectures, in particular of the type currently referred to as “pipeline” architectures.
2. Description of the Related Art
One of the main effects of the introduction of the pipelining technique is the modification of the relative timing of instructions resulting from the overlapping of their execution, which introduces factors of conflict or hazard due both to data dependence (data hazards) and to modifications of the control stream (control hazards). In particular, such conflicts emerge when sending of instructions through the pipeline modifies the order of read/write accesses to operands with respect to the natural order of the program (i.e., with respect to the sequential execution of instructions in non-pipelined processors).
In this connection, useful reference may be made to J. Hennessy and D. A. Patterson, “Computer Architecture: A Quantitative Approach,” Morgan Kaufmann Publishers, San Mateo, Calif., Second Edition, 1996.
The set of problems linked in particular to data hazards may be solved at a hardware level with the technique currently referred to as “forwarding” (or also “bypassing,” and sometimes “short-circuiting”). This technique uses the interstage registers of the pipeline architecture for forwarding the results of an instruction Ii, produced by one stage of the pipeline, directly to the inputs of the previous stages of the pipeline in order to be used in the execution of instructions that follow Ii. A result may therefore be forwarded from the output of one functional unit to the inputs of another unit that precedes it in the flow along the pipeline, and likewise starting from the output of one unit to the inputs of the same unit.
In order to ensure this forwarding mechanism, it is necessary to provide, in the processor, the required forwarding paths and the control of these paths. The forwarding technique may require a specific path starting from any register of the pipeline structure to the inputs of any functional unit, as in the case of the architecture known as “DLX,” to which reference is made in the text cited previously.
Data bypassed to the functional units of the early pipeline stages are normally in any case stored in the register file (RF) during the last pipeline stage (i.e., the so-called “write-back stage”) in view of a subsequent use in the program being executed. Processors that use the forwarding technique achieve substantial improvements in terms of performance owing to the elimination of stall cycles introduced by data-hazard factors.
The main problems linked to the forwarding mechanism in the sphere of processors, and in particular in the sphere of the so-called “very-long-instruction-word or VLIW processors” have been investigated in studies, such as A. Abnous and N. Bagherzadeh, “Pipelining and Bypassing in a VLIW Processor,” IEEE Trans. on Parallel and Distributed Systems, Vol. 5, No. 6, June 1994, pp. 658-663, and H. Corporaal, “Microprocessor Architectures from VLIW to TTA,” John Wiley and Sons, England.
The above works analyze the advantages in terms of performance of various bypassing schemes, in particular as regards their effectiveness in solving data hazards in both four-stage and five-stage pipeline architectures.
The idea of exploiting register values that are bypassed during pipeline stages has been combined with the introduction of a small register cache with the purpose of improving performance, as is described in the work by R. Yung and N. C. Wilhelm, “Caching Processor General Registers,” ICCD '95. Proceedings of IEEE International Conference on Computer Design, 1995, pp. 307-312. In this architecture, referred to as “Register Scoreboard and Cache,” pipeline operands are supplied either by the register cache or by the bypass network.
In the work by L. A. Lozano and G. R. Gao, “Exploiting Short-lived Variables in Superscalar Processors,” MICRO-28, Proceedings of 28th Annual IEEE/ACM International Symposium on Microarchitecture, 1995, pp. 292-302, a scheme is proposed for superscalar processors which comprises an analysis carried out by the compiler and an extension of the architecture in order to avoid definitive writings in the RF (commits) of the values of variables which are bound to be short-lived and which, consequently, do not require long-term persistence in the RF. The advantages provided by this solution have been assessed by the authors prevalently in terms of reduction of the write ports to the RF and of reduction in the amount of transfers from registers to memory required, so as to achieve improvements in execution time. The work referred to reports the improvements linked to this solution in terms of performance, without any consideration, however, of the effects in terms of power absorption.
The concept of avoiding the presence of information without any useful value (dead-value information) in the RF is analyzed in the work by M. M. Martin, A. Roth, and C. N. Fischer, “Exploiting Dead Value Information,” MICRO-30, Proceedings of 30th Annual IEEE/ACM International Symposium on Microarchitecture, 1997, pp. 125-135. The values in the registers are considered useless or “dead” when they are not read before being overwritten. The advantages of this solution have been studied in terms of reduction in RF size and elimination of unnecessary save/restore instructions from the execution stream at procedure calls and across context switches.
As has been shown in works, such as A. Chandrakasan and R. Brodersen, “Minimizing Power Consumption in Digital CMOS Circuits,” Proc. of IEEE, 83(4), pp. 498-523, 1995, and K. Roy and S. C. Prasad, “Low-power CMOS VLSI Circuit Design,” John Wiley and Sons, Inc., Wiley-Interscience, 2000, a reduced power absorption constitutes an increasingly important requirement for processors of the embedded type. Low-power-absorption techniques are widely used in the design of microprocessors in order to meet the stringent constraints in terms of maximum power absorption and operating reliability, whilst maintaining unaltered the characteristics in terms of processing speed.
The majority of low-power-absorption techniques developed for digital CMOS circuits aim at reducing switching power, which represents the most significant contribution to the global power budget. For high-performance processors, low-power-absorption solutions aim at reducing the effective capacitance CEFF of the processor nodes being switched.
The parameter CEFF of a node is defined as the product of the load capacitance CL and the switching activity α of the node. In digital CMOS processors it is possible to obtain considerable economy in terms of power absorption by minimizing the transition activity of high-capacitance buses, such as data-path buses and input/output buses. Another significant component of the power budget in modem processors is represented by multi-port RF accesses and other on-chip cache accesses.