1. Technical Field
The present invention relates generally to an improved data processing system and method. More specifically, the present invention provides an apparatus and method for speeding up access time of a large register file with wrap capability.
2. Description of Related Art
The basic structure of a conventional computer system includes one or more processing units connected to various input/output devices for the user interface (such as a display monitor, keyboard and graphical pointing device), a permanent memory device (such as a hard disk, or a floppy diskette) for storing the computer's operating system and user programs, and a temporary memory device (such as random access memory or RAM) that is used by the processor(s) in carrying out program instructions. The evolution of computer processor architectures has transitioned from the now widely-accepted reduced instruction set computing (RISC) configurations, to so-called superscalar computer architectures, wherein multiple and concurrently operable execution units within the processor are integrated through a plurality of registers and control mechanisms.
An illustrative embodiment of a conventional processing unit is shown in FIG. 1, which depicts the architecture for a PowerPC™ microprocessor 12 manufactured by International Business Machines Corporation. Microprocessor 12 operates according to reduced instruction set computing (RISC) and is a single integrated circuit superscalar microprocessor. The system bus 20 is connected to a bus interface unit (BIU) of microprocessor 12. Bus 20, as well as various other connections described, include more than one line or wire, e.g., the bus could be a 32-bit bus.
BIU 30 is connected to an instruction cache 32 and a data cache 34. The output of instruction cache 32 is connected to a sequencer unit 36. In response to the particular instructions received from instruction cache 32, sequencer unit 36 outputs instructions to other execution circuitry of microprocessor 12, including six execution units, namely, a branch unit 38, a fixed-point unit A (FXUA) 40, a fixed-point unit B (FXUB) 42, a complex fixed-point unit (CFXU) 44, a load/store unit (LSU) 46, and a floating-point unit (FPU) 48.
The inputs of FXUA 40, FXUB 42, CFXU 44 and LSU 46 also receive source operand information from general-purpose registers (GPRs) 50 and fixed-point rename buffers 52. The outputs of FXUA 40, FXUB 42, CFXU 44 and LSU 46 send destination operand information for storage at selected entries in fixed-point rename buffers 52. CFXU 44 further has an input and an output connected to special-purpose registers (SPRS) 54 for receiving and sending source operand information and destination operand information, respectively. An input of FPU 48 receives source operand information from floating-point registers (FPRs) 56 and floating-point rename buffers 58. The output of FPU 48 sends destination operand information to selected entries in rename buffers 58.
Microprocessor 12 may include other registers, such as configuration registers, memory management registers, exception handling registers, and miscellaneous registers, which are not shown. Microprocessor 12 carries out program instructions from a user application or the operating system, by routing the instructions and data to the appropriate execution units, buffers and registers, and by sending the resulting output to the system memory device (RAM), or to some output device such as a display console.
A high-level schematic diagram of a typical general-purpose register 50 is further shown in FIG. 2. GPR 50 has a block 60 labeled “MEMORY_ARRAY—80×64,” representing a register file with 80 entries, each entry being a 64-bit wide word. Blocks 62a (WR0_DEC) through 62d (WR3_DEC) depict address decoders for each of the four write ports 64a–64d. For example, decoder 62a (WR0_DEC, or port 0) receives the 7-bit write address wr0_addr<0:6> (write port 64a). The 7-bit write address for each write port is decoded into 80 select signals (wr0_sel<0:79> through wr3_sel<0:79>). Write data inputs 66a–66d (wr0_data<0:63>through wr3_data<0:63>) are 64-bit wide data words belonging to ports 0 through 3 respectively. The corresponding select line 68a–68d for each port (wr0_sel<0:79> through wr3_sel<0:79>) selects the corresponding 64-bit entry inside array 60 where the data word is stored.
There are five read ports in this particular prior art GPR. Read ports 70a–70e (0 through 4) are accessed through read decoders 72a–72e (RD0_DEC through RD4_DEC), respectively. Select lines 74a–74e (rd0_sel<0:79> through rd4_sel<0:79>) for each decoder are generated as described for the write address decoders above. Read data for each port 76a–76e (rd0_data<0:63> through rd4_data<0:63>) follows the same format as the write data. The data to be read is driven by the content of the entry selected by the corresponding read select line.
Register files, such as the one described above, are a common type of storage circuitry used in modern day state-of-the-art microprocessors. For example, in the complex architecture of present-day microprocessors, an instruction buffer is used to store instructions coming out of the instruction cache, e.g. instruction cache 32, and may consist of a number of register file cells. For large register file arrays having multiple simultaneous reads and writes, it becomes extremely difficult to meet the cycle timing constraint to perform the decoding of the address lines and the reading of the register file array before having to store the result in the next stage latch. If the register file has wrap capability, i.e. once a last entry of a group or sub-array of the register file is accessed the next access goes back to the first entry of that group/sub-array, it becomes very complex and is almost impossible to work with all of the write addresses and read addresses for decoding. Instead, only the starting address is used for decoding purposes. However, this adds more time to the critical timing path due to the necessity to include additional circuitry to handle the wrap condition when only the starting address is used for decoding.
Therefore, it would be beneficial to have an improved apparatus and method for accessing a large register file having wrap capability. More specifically, it would be beneficial to have an apparatus and method for access a large register file having wrap capability which does not add additional time to the critical timing path.