In order to achieve higher performance, modem computer systems are beginning to issue more than one instruction for each processor clock cycle. Each instruction includes a single operation code (opcode) specifying its function, as well as one or more operands for specifying addresses of data. The data addresses can be memory addresses or register addresses. Computers that can issue more than one instruction for each clock cycle are called superscalar computers.
Traditionally, because of the complexity of superscalar computers, the number of instructions which can be issued per processor cycle has been relatively small, e.g., two to four instructions per cycle. Furthermore, the number of different types or classes of instructions which can be executed concurrently may be limited. By way of example, a triple-issue processor might be able to concurrently issue an arithmetic instruction, a memory reference instruction, and a branch instruction. However, the traditional superscalar processor can not concurrently issue three memory reference instructions.
Each instruction may include source and destination operands. The operands can specify addresses of data manipulated by the instructions. While executing, the data are stored in high-speed registers that are part of the processor. Usually, registers that have a common architecture are organized into sets of registers, known as register files.
A processor may be equipped with separate floating-point and fixed-point or integer register files. Ports are used to read and write the register files. By restricting the number and type of instructions which can concurrently issue, the access paths or "ports" of registers can be simplified. For example, if only one fixed-point arithmetic instruction and only one fixed/point load/store instruction can issue concurrently, at most, three read or output ports, and two write or input ports are required to access the fixed-point registers.
As superscalar processors are designed with larger issue widths, more ports to the register files may be required. Increasing the number of ports consumes surface area of the semiconductor die used for the circuits of the processor. The number of circuits can increase worse than linear when the number of ports is increased. In addition, as the number of ports is increased, access latencies can also increase.
One approach avoiding the disadvantages of a large multiported register file would have multiple copies of the various register files, one copy for each possible data path. Then, the number of read (output) ports required for each register file can be reduced. However, having multiple copies of the register files increases the complexity of write accesses. Data stored in one copy of the register file must be duplicated in other copies of the register file. This means additional write (input) ports, and hence, the total number of ports is increased. Also, with duplicate register files the chip area must increase.
Therefore, it is desired to have means and methods which increase the number of instructions concurrently issued by a superscalar processor without substantially increasing the complexity of interconnects of the registers used to store data manipulated by the executing instructions.