The present disclosure generally relates to microprocessors and microprocessor memory systems, and more specifically, an apparatus and method for providing multiple read port memory system with a single port memory cell.
Microprocessors use memory arrays such as register files to store data temporarily for a processing unit. To enable simultaneous access to a memory cell array, register files are used to provide multiple read and/or write ports. Depending on application these so called multi-port register files can be configured up to eight or even more read ports.
FIG. 2 illustrates a block diagram of a conventional (1 Write, 4 Read) 1W4R register file 10. While an address read architecture is depicted, it is understood that a write word line configuration (not shown) uses a similar architecture as known in the art. In the read word line implementation, the 1W4R register file 10 provides four read Address Decoder elements DCD0, . . . , DCD3 elements 15_0, . . . 15_3 respectively, for each of the four read ports 1W4R (1 Write, 4 Read) port bit cells 30_0, . . . , 30_3 respectively. Each respective decoder element 15_0, . . . 15_3 receives a respective read enable bit decoder selector signal and respective read address bits, (e.g., 2 bits) collectively at respective read address bit input lines 12_0, . . . 12_3. When enabled, the decoder element 15_0, . . . , 15_3 generates, in response to the 2 read address inputs, respective parallel output read address decode bits 17 on a bus. In the implementation shown, a 2:4 read address decoder element 15_0, . . . , 15_3 provides an output of four read address decode bits 17. As further shown in FIG. 2, a corresponding clock control buffer device 20_0, . . . , 20_3 is provided to receive the four read address output bits 17 of the respective enabled decoder element 15_0, . . . , 15_3. The inset of FIG. 2 shows a detailed processing at a clock control buffer element 20_3 where a received decoder parallel output bit 17 is combined using an AND or similar logic gate 23, with a read control clock signal 25 to clock in the four parallel read address decode signals, referred to as RWL0, . . . , RWL3 22 in a read operation. A respective set 22_0, . . . , 22_3 of Read Word Line (RWL0, . . . , RWL3) signals is input to a respective 1W4R port bit cell 30_0, . . . , 30_3 to selectively read the data value stored therein. Each Read Word Line signal 22 is received at a respective read port pass gate circuit to drive the corresponding output bit cell value (e.g., truth or its complement) at a corresponding local bit line 37. Read output data on the local bit line LBL0, . . . , LBL3 are output as register file 10 outputs RD0-RD3 via processing at respective local receiver element 40 and global receiver and output driver element 50.
A write word line implementation for writing data to a register file is also provided using a similar structure. In the case of write operations (not shown), the same structures are implemented, i.e., a write enable bit, and write address bits (not shown) are input to a respective write decoder (not shown) where the outputs are gated, using a clock (CLK) control buffer and bit cell to generate write bits, e.g., Write Word Line (WWL) bits for performing a 1W4R bit cell write operation.
Depending on an application, a multi-port register file can be configured up to eight or more read ports.
FIG. 1 shows a detailed schematic diagram of a conventional 1W4R port bit cell circuit 30 (representing a single 1W4R circuit 30_0, . . . , 30_3 of FIG. 2). Each bit cell 30 includes one write port and four read ports and implements a single memory bit cell (single bit cell) 75 of a conventional 6-transistor memory bit cell design implementing a cross-coupled inverter configuration and includes a single read port.
As shown in FIG. 1, to write data to single bit cell 75, input Write Word Line (WWL) 29 receives decoded write signal from write decode circuitry (not shown) to activate storing a data value at respective bit cell node 82 and complementary bit cell node 84 of the bit cell 75 in conjunction to data value inputs WBL0—t (e.g., write bit line 0 true data value) and WBL0—c (e.g., write bit line 0 complement data value). For example, a low or “0” value WWL signal may represent a bit cell hold operation, while a high or logic “1” value WWL signal may represent a bit cell write operation.
In FIG. 1, at each local bit line LBL0, . . . , LBL3 corresponding to local bit lines 370, . . . , 373, there is connected a pass gate selection circuit 90 comprising a serial configuration of parallel operated pull-down NMOS FET devices N0, N1. Data values at LBL0, . . . LBL3 are read out under control of NMOS device N0 coupled to a respective read word line RWL0, . . . , RWL3 that each receive decoded address signals to drive the respective read bit lines 370, . . . , 373 to its true (or complementary) values based on the data written to and stored at the bit cell nodes 82 (84). For each read port, a data value stored at a single bit cell node 82 or 84 is read out by a corresponding NMOS transistor device N1 whose gate is connected to the corresponding cell node. In the example circuit 30 of FIG. 1, a local bit line data value corresponding to a true value is read from read bit lines LBL1—t and LBL3—t of 1W4R bit cell circuit, and its complement value is provided at complement read bit lines LBL0—c, LBL2c. Read bit lines LBL0, . . . LBL3 are usually pre-charged to high values, e.g., in a pre-charge phase using a local_prch signal 42 (i.e., local_prch=0) until the bit cell 75 drives the bitline high or low according to the stored bit cell data value in the evaluation phase of a read process (local_prch=1).
Selection circuits (i.e., N0, N1 pass gates) can be added as many as read ports are needed. However, additional register file cell circuitry and wire lines are required taking up much more chip area.
FIG. 3 depicts a further conventional circuit 125 for reading out data values from 1W4R port register file 30 of FIG. 1. This read process occurs in two stages via the local receiver circuits 40 and global receiver and output driver circuits 50. When selected, read data for each port is driven from the cell nodes to the local bitline (LBL0, . . . , LBL3) and fed to the corresponding local receiver 40_0, . . . , 40_3 which in turn drives the data on respective lines global bit lines GBL0, . . . , GBL3 through the global receiver 50 provide the read data output RD0-RD3 in parallel. As known in the art, each local receiver 400, . . . , 403 includes an inverter as amplifier, pull-up transistor devices (pre-charge and keeper) 45 and a, NMOS transistor device 46 in a pull-down configuration at the local receiver for driving the read local bitline data values on respective lines global bit lines GBL30, . . . , GBL3 for receipt at the global receiver 50.
One drawback of the conventional multi-port register file architecture 10 of FIG. 2, is that cell area and the wire pitch area increases typically linearly to the number of the read ports. While growing cell height is not an issue, increasing in width is in general strictly limited due to a predefined standard cell pitch.
Furthermore, bit cell layout design is very challenging with increasing number of ports. Being located in a very congested area, it is very likely that a multi-port bitcell may exhibit more crosstalk coupling occurring between adjacent bitlines and word lines.
Further, with the additional loading on each of the storage nodes (true/comp), read/write access times increases accordingly to the number of the ports.
Furthermore, as there is one decoder for each read port, the decoded address is combined with the read clock in the clock control buffer to generate the read word lines. As indicated in FIG. 2 wiring becomes more and more complex in this particular region with increasing number of read ports.
It would be highly desirable to provide a more area efficient register file with multiple read ports, and a method for operating the register file, that avoids the drawbacks of the conventional multi-port cell architecture.