1. Field of the Invention
The present invention relates to functional units and registers to process data in a microprocessor, and more particularly, to a microprocessor with clusters and register files which are associated with each other to enhance the efficiency of data process therein.
2. Description of the Related Art
A microprocessor in an electronic system generally contains multiple functional units and multiple registers for the use of data process therein. Each functional unit executes instructions to write data into pertinent register(s) in a register file. Functional units may be any data computation units such as an arithmetic logic unit (ALU), an adder unit, a floating point unit, a load store unit, etc.
Since functional units in a microprocessor dispatch data to a register file in the same cycle, a register file should have the same number of write ports as that of the functional units to satisfy the “peak data write requirement”, in which all the functional units generate data to be written into a register file in the same cycle. Thus, as the number of functional units in a microprocessor is increased, the number of write ports of a register file should be increased to satisfy the peak data write requirement.
Increase in the number of ports in a register file causes increase in the area required to implement the register file and also in the time required to access data in the register file. For example, in a data write mode, the number of write ports in a register file determines the number of data values (or, the amount of data) that can be simultaneously written into the register file.
Referring to FIG. 1, there is provided a block diagram illustrating a register file and functional units in a typical microprocessor. The microprocessor 10 may have “n” functional units FU1-FUn each of which can simultaneously produce data every cycle. In this case, to satisfy the peak data write requirement, the microprocessor 10 should have a register file 12 with the same number of write ports WP1-WPn as that of the functional units FU1-FUn, i.e., “n” write ports.
In case that it is required for a microprocessor to have more functional units, it is also required to increase the number of write ports of a register file in the microprocessor. Such an increase in the number of write ports affects size and speed of the microprocessor.
To overcome such problems in the conventional microprocessors, a register file in a microprocessor is designed to have fewer number of write ports than the number of functional units. In such processors, it is necessary to arbitrate the functional units for the write ports of the register file. In other words, an arbitration unit is required to manage data communication between the functional units and the write ports of a register file.
In an arbitration process, a functional unit should first send a request signal to an arbitration unit to write data into a register file. The arbitration unit receives all request signals from functional units and then grants certain functional units access to the write ports in accordance with an arbitration logic. Then, the functional units of which requests have been granted may proceed to write data into a register file, and other functional units of which requests have not been granted should request the access in the next cycle.
In a microprocessor adopting the arbitration technique, since each functional unit should send an access request and wait for the grant, it causes additional delay in data process of the microprocessor. For example, a cycle time for the microprocessor may be increased by a time period required for the arbitration process. Also, the arbitration process may affect performance of the microprocessor by forcing the functional units stall if there is no write port free.
Another example of a conventional approach in this area can be found in “The Multi-cluster Architecture: Reducing Cycle Time Through Partitioning” by K. I. Frakas et al., pp. 149-159, MICRO-30, December 1997. In this reference, architected registers are partitioned for the purpose of decoupling clusters and reducing read and write ports of a register file. In this technique, data read and write operation can be performed only between particular register files and functional units associated with each other. This technique is described below with reference to FIG. 2.
In FIG. 2, the first and second functional units FU1, FU2 are associated with the first and second register files RF1, RF2, respectively. The first register file RF1 has architected registers r0-r15, and the second register file RF2 has architected registers r16-r31. The first functional unit FU1 has efficient access to the architected registers r0-r15 in the first register file RF1, and the second functional unit FU2 has efficient access to the architected registers r16-r31 in the second register file RF2. For example, the efficient access may be accomplished when instruction “r7←r11+r12” is dispatched to the first functional unit FU1, and instruction “r17←r23+r31” is dispatched to the second functional unit FU2.
However, this technique has drawbacks in case of instructions such as instruction “r7←r11+r31” which is dispatched to the first functional unit FU1. In this case, to obtain the contents of the architected register r31, the first functional unit FU1 should have access to the second register file RF2. The access path between the first functional unit FU1 and the second register file RF2 is so slow that performance of the microprocessor may be severely retarded.
Another problem in the microprocessor in FIG. 2 is that computation of the microprocessor may be distributed unevenly. In other words, if the program being executed in the microprocessor uses mostly architected registers r0-r15 of the first register file RF1, the computation for the program is not evenly distributed and the registers r16-r31 in the second register file RF2 are not utilized.
Therefore, a need exists for a microprocessor having less number of write ports in a register file than the number of functional units, while having no problems such as performance delay or degradation caused by the arbitration process, data access through the slow paths, the uneven distribution of computation, etc.