1. Field of the Invention
The present invention relates to a memory device having a plurality of read ports, and more particularly, to a memory device which can drive multiple read ports simultaneously without affecting the data stored therein.
2. Description of the Prior Art
FIG. 1 illustrates a floating point data path of a conventional floating point processor. As illustrated, the floating point data path includes a register file 100 comprising a plurality of registers for storing input data such as operands received from the processor's data cache, operand alignment circuits 102 and 104, and floating point processors such as floating point ALU (FALU) 106, floating point multiplier (FMULT) 108 and floating point divide/square root circuit 110. Typically, the floating point multiply, divide, add and load or store functions are performed by sequentially executing separate instructions (i.e., only one add or one multiply is performed at a time). Recently, however, floating point processors have been designed to allow concurrent execution of the floating point multiply, divide, add and load or store instructions, thereby significantly increasing the processing efficiency of the floating point processor. For such floating point processors it has become desirable that the register file 100 have a plurality of read ports and a plurality of write ports to facilitate the concurrent processing.
DeLano et al. describe in an article entitled "A High Speed Superscalar PA-RISC Processor", Proceedings of the Compcon Spring 1992, Digest of Papers, San Francisco, Calif., Feb. 24-28, 1992, a floating point processor comprising such a floating point data path. The register file 100 of DeLano et al.'s floating point processor has 32 64-bit registers (4 registers are reserved for floating point exception data) and 5 read ports and 3 write ports to allow concurrent execution of a multiply, an add and a load or store. It was the goal of the present inventors to design a register file 100 for such a floating point processor which can dump stored data to five or more read ports simultaneously without disturbing the state of the register file as a result of the capacitance on the output lines.
In designing such a register cell, it is desirable that the speed of the circuit be maximized while the chip area of the register cell is minimized. If the register file 100 is small, the individual RAM cells (which may comprise simple cross-coupled inverters) of the register file 100 may be made large and powerful. However, as the number of RAM cells and the number of read ports increases, the speed of the cell decreases as the cost of the RAM cells increases. For example, if there are 2,048 RAM cells in the register file 100, and each RAM cell has 5 read ports, the size of the read port dominates the size of the RAM cell. It has thus been desirable to design the read ports so that they are small, have a sufficiently small output delay, and are capable of dumping multiple read ports simultaneously without disturbing the contents of the RAM cell. It has also been desirable for the read ports to present a small capacitive load to the RAM cell so that the setup time of the write ports (which may comprise simple transfer gates) is not degraded. The known memory cells do not meet these needs.
For example, a register cell 100 having the simplest form of a read port for the RAM cell has a transfer gate as illustrated in FIG. 2. As shown, each write port of the register cell 100 consists of a simple transfer gate (200, 202 or 204) which receives an input (ina, inb, inc) from, for example, FALU 106, FMULT 108 or from the data cache and transfers the respective inputs to node N1 when their corresponding WRITE signals (wra, wrb, wrc) from a write address decoder are high. The respective inputs are then stored in RAM cell 206. As shown, each RAM cell 206 typically comprises cross-coupled inverters comprised of PFETs 208 and 212 and NFETs 210 and 214. The output of the RAM cell 206 may then be output to the appropriate output line (OUT1, OUT2, OUT3) via respective transfer gates 216, 218 or 220 in response to a READ signal (READ1, READ2, READ3).
In the configuration of FIG. 2, the weak inverters of the RAM cell 206 must drive the relatively large capacitance of each of the output ports simultaneously. However, the inverters of the RAM cell 206 typically do not have enough current capability to drive multiple output ports simultaneously. In addition, since charge sharing from output busses can upset the values stored in the RAM cell 206, such a configuration is generally unsuitable for use with multiple read ports. For example, the capacitance on the output bus could drive a value to the RAM cell 206 unless PFETs 208 and 212 are large enough to drive all outputs simultaneously. For this reason, in order for the RAM cell 206 to drive multiple read ports, the inverters of the RAM cell 206 must be relatively large and thus take up a relatively large area on the chip substrate.
FIG. 3 illustrates a register cell 100 similar to that illustrated in FIG. 2 except that an output inverter is disposed between the RAM cell 206 and each output port so as to render the register cell 100 suitable for use with a small number of read ports by decoupling the RAM cell 206 from the outputs. Generally, this inverter (transistors 300, 302; transistors 304, 306; or transistors 308, 310) provides the required current, to drive the read port capacitance. In addition, since the storage node nin in this configuration is buffered from the output by the inverters, charge sharing is not a problem as in the configuration of FIG. 2.
However, the read port configuration of the register cell 100 of FIG. 3 also has several disadvantages. First, each read port places an additional load on the RAM cell 206 which increases the setup time required to write the RAM cell 206. Second, there are two pulldown transistors (e.g., 300, 302) driving the output load. As a result, in order to minimize the output delay each of the pullup or pulldown transistors (300, 302 or 304, 306 or 308, 310) must be twice as wide as would be necessary if a single pullup or pulldown transistor were used in the absence of transfer gates 216, 218 or 220. This increases the capacitance on both the READ line and on the inverters of the RAM cell 206. The capacitance on the read port is also increased due to the diode and gate overlap capacitances of the larger output transistors. Of course, additional area is also required for the larger transistors.
FIG. 4 illustrates a register cell 100 based on that shown in FIG. 3 except that the read ports are precharged. In this configuration, the PFETs of the buffer inverters are removed since the RAM cell 206 is not required to pull up the precharged output. Instead, the read port need only pull down the output line using transistors 400, 402, or 404 for a low output. During operation, when the READ line (READ1, READ2, READ3) is true and the RAM cell 206 is storing a "0" (nin=1), the read port is discharged. Otherwise, the read port remains precharged. However, this configuration also has many of the same disadvantages recited above with respect to the configuration of FIG. 3.
Another implementation of a multiple read port register file known to the present inventors actually replicates the entire register array once for each read port. This avoids the problems associated with dumping multiple read ports of registers simultaneously but at a tremendous area penalty, for not only is the dump circuit replicated a plurality of times, but the address decoders, RAM cells, sensing amplifiers and write ports are replicated as well.
Accordingly, an improved register file is desired which may support a plurality of read ports for dumping data simultaneously without disturbing the state of the values stored in the RAM cells while also providing a small setup time and maximum speed and using a small chip area. The present invention has been designed to meet these needs.