1) Field of the Invention
The present invention relates to a register file mounted in a processor such as microprocessor or CPU, and including a plurality of register arrays used for storing intermediate results of a calculation, constants, and so forth. In particular, the present invention relates to a register file having a multiport configuration in which a plurality of read ports and a plurality of write ports are mounted, and a plurality of read accesses and a plurality of write accesses can independently and concurrently be made through these ports.
2) Description of the Related Art
As shown in FIG. 7, a register file 100 with a typical multiport configuration includes register arrays 101 forming a word width n (the number of words: for example, n=32, 64, 128, . . . ), and a word having a bit width m (the number of bits: for example, m=16, 32, . . . ) can be stored in each of the register arrays 101. That is, a main body (register portion) of the register file 100 includes cell arrays arranged in an m by n rectangle.
Further, the register file 100 has three read ports 110X to 110Z, and four write ports 120A to 120D. Through these ports 110X to 110Z and 120A to 120D, three read accesses and four write accesses can be made independently and concurrently.
The register file 100 includes read decoders 130X to 130Z to respectively decode read addresses Rx to Rz externally input for selections of words to be read from the read ports 110X to 110Z. The read decoders 130X to 130Z respectively put in a read state the register arrays 101 specified according to results of decoding, and send data (words) stored in the register arrays 101 to the read ports 110X to 110Z.
The read ports 110X to 110Z are respectively provided with sense amplifiers 111. Signals read from the register arrays 101 are sent to the sense amplifiers 111 through unillustrated bit lines (data lines). Subsequently, the signals are amplified by the sense amplifiers 111 up to a level at which digital signal processing can be performed.
In addition, the register file 100 includes write decoders 140A to 140D to respectively decode write addresses Wa to Wd externally input to specify on which of the register arrays 101 the data input from the write ports 120A to 120D should be written. The write decoders 140A to 140D respectively put in a write state the register arrays 101 specified according to results of decoding, and the data from the write ports 120A to 120D are stored in the register arrays 101.
Meanwhile, from year to year, higher performance has increasingly been desired in a processor such as microprocessor with the register file incorporated therein. Thus, an operating frequency is made higher and an amount of handled data is increased steadily, thereby increasing the capacity of the register file.
However, in the register file 100 having the configuration as shown in FIG. 7, when the number of register arrays 101 is increased up to, for example, 1,028 (1,028 words) so as to increase the amount of handled data, there is a problem in that a delay is caused at a time of read access due to loads on the bit lines extending from the register arrays 101 to the read ports 110X to 110Z.
That is, no delay is caused in the register arrays 101 positioned in the vicinity of the sense amplifiers 111 in the read ports 110X to 110Z. On the other hand, considerably long physical distances (the lengths of bit lines) are required between the register arrays 101 positioned on the side of the write ports 120A to 120D in FIG. 7 and the sense amplifiers 111.
Hence, it takes a long time to send signals stored in the register arrays 101 at extremely low levels to the sense amplifiers 111 through the bit lines, and amplify the signals by the sense amplifiers 111, thereafter sending the signals to, for example, flip-flops in the next stage. As a result, the delay may cause a reduction in performance of the whole logic unit.
In view of the facts, as shown in FIG. 8, a register file 200 employing a column-row read/write system may be used.
As in the register file 100 shown in FIG. 7, the register file 200 shown in FIG. 8 has n register arrays 201 with a bit width m. However, in the register file 200, the four register arrays 201 are aligned horizontally (in a lateral direction of FIG. 8), thereby reducing a word width of the register file 200 to a quarter (n/4) of the word width of the register file 100. A main body (register portion) of the register file 200 includes cell arrays arranged in an (m by 4) by (n/4) rectangle. That is, the register file 200 is laterally divided into the four columns with the bit width m, and is divided into n/4 rows longitudinally (in a longitudinal direction of FIG. 8).
Further, the register file 200 has three read ports 210X to 210Z, and four write ports 220A to 220D. Through these ports 210X to 210Z and 220A to 220D, three read accesses and four write accesses can be made independently and concurrently.
The register file 200 includes row decoders 230X to 230Z and column decoders 231X to 231Z to respectively decode read addresses Rx to Rz (which are, for example, 5-bit address information for n=32) externally input for selections of words to be read from the read ports 210X to 210Z, and includes 4 to 1 multiplexers 232X to 232Z.
Each of the row decoders 230X to 230Z selects one specific row from among the n/4 rows depending upon high order bits (for example, three high order bits) in each of the read addresses Rx to Rz, and puts in a read state four register arrays 201 in the row, thereby sending data (words) stored in the register arrays 201 to each of the 4 to 1 multiplexers 232X to 232Z.
Each of the column decoders 231X to 231Z selects one specific column from among the four columns depending upon low order bits (for example, two low order bits) in each of the read addresses Rx to Rz, thereby sending 4-bit column indicating information to each of the 4 to 1 multiplexers 232X to 232Z.
The 4 to 1 multiplexers 232X to 232Z respectively free column portions corresponding to the column indicating information from the column decoders 231X to 231Z, and send data from the columns to the read ports 210X to 210Z.
The read ports 210X to 210Z are respectively provided with sense amplifiers 211 identical with those in the above discussion. Signals read from the register arrays 201 are sent to the sense amplifiers 211 through unillustrated bit lines (data lines). Subsequently, the signals are amplified by the sense amplifiers 211 up to a level at which digital signal processing can be performed.
In addition, the register file 200 includes write decoders 240A to 240D to respectively decode write addresses Wa to Wd externally input to specify on which of the register arrays 201 the data input from the write ports 220A to 220D should be written. The write decoders 240A to 240D respectively put in a write state the register arrays 201 specified according to results of decoding (the register array 201 positioned in a predetermined column and a predetermined row), and the data from the write ports 220A to 220D are stored in the register arrays 201.
In the above register file 200, it is possible to reduce physical distance from the register array 201 to the sense amplifier 211 to, at the longest, a quarter of the longest distance in the register file 100 shown in FIG. 7. When the register file 200 includes the register arrays 201 to have a capacity of, for example, 1,028 words, the register file 200 has the word width of 256 words, and the physical distance from each of the register arrays 201 to the sense amplifier 211 corresponds to the 256 words at the longest.
Therefore, even when the number of register arrays 201 is increased to increase an amount of handled data, in the register file 200, it is possible to overcome the above problem in that the delay is caused due to the loads on the bit lines at the time of read access.
However, in the register file 200 shown in FIG. 8, though the word width can be reduced to a quarter, the bit width increases fourfold.
In recent years, in a high-performance microprocessor, a data bus width (the number of bits corresponding to a single word) has increasingly been expanded (to, for example, 64 bits or 128 bits) as part of performance improvement. The expansion extremely increases the bit width (to, for example, 256 bits or 1,024 bits) in the register file 200 shown in FIG. 8, thereby providing longer decode lines extending from the decoders 230X to 230Z to the cell arrays. Thus, there is a problem in that the long decode lines cause a delay, resulting in a reduction in performance.
As stated above, the sense amplifier 111 is not always mounted for each of the read ports 110X to 110Z in the register file 100 shown in FIG. 7. Hence, when the word width is expanded, the delay due to the loads on the bit lines causes the reduction in performance. On the other hand, in the register file 200 shown in FIG. 8, though the delay due to the loads on the bit lines can be overcome, the delay due to the long decode lines causes the reduction in performance. In either case, the bit line or the decode line must be made longer with increase in the number of words, resulting in a longer delay time. Consequently, it becomes increasingly difficult to realize rapid access.
Further, for the read addresses Rx to Rz including, for example, 5-bit data in the register file 100 shown in FIG. 7, in most packaging, each of the read decoders 130X to 130Z has a two-stage configuration including a three-input NAND gate and a two-input NAND gate, and a NOR gate receiving outputs of the two NAND gates. Naturally, as the number of bits of the read addresses Rx to Rz is more increased, each of the read decoders 130X to 130Z requires a greater number of gate stages.
However, with increase in the number of stages of the gates forming each of the read decoders 130X to 130Z, the read decoders 130X to 130Z have a larger size, and a longer time is required for decoding at a time of data readout, thereby causing the reduction in performance. Therefore, it has been desired to reduce the number of gate stages in the read decoders 130X to 130Z so as to realize more rapid decoding at the time of data readout.