The recent tendency in semiconductor memories, such as DRAMs (Dynamic Random Access Memories), is towards more sophisticated function, higher operating speed and larger capacity. In addition, a memory input/output data band width is also appreciably improved with the introduction of an architecture such as DDR (Double Data Rate)/DDR2/DDR3.
To improve memory input/output data band width, the amount of data that may be handled needs to be increased by improving memory READ or WRITE cycle time (tRC: ROW CYCLE TIME) or the number of simultaneous operations (parallel operations) in a memory or increasing the number of banks in the memory cell array. The number of simultaneous operations, or the number of parallel operations, needs to be increased by increasing the number of parallel lines.
In a well-known manner, the consumed power P may be approximated by the following equation (1):n×c×f×V2   (1)
In the equation (1), n is the number of elements, c is capacitance (output load capacitance charged/discharged by the elements), f is the operating frequency, and V is the operating voltage. The derivation of the equation (1) will now be explained briefly. The power P is an average of the power consumed when an element charges/discharges the output load capacitance (dynamic dissipation). With the operating frequency (in actuality, the toggle frequency) f and with the output load capacitance CL, the power may be expressed as the sum of the power when an output of an element Vout rises from Low (0V) to High (VDD) and the power when the output Vout falls from High (VDD) to Low (0V), and may be approximated by
                                                                        Pd                =                                ⁢                                                                                                    C                        L                                            tp                                        ⁢                                                                  ∫                        0                        VDD                                            ⁢                                                                        V                          out                                                ⁢                                                                                                  ⁢                                                  ⅆ                                                      V                            out                                                                                                                                ⁢                                                                          +                                                                                                                        ⁢                                                                            C                      L                                        tp                                    ⁢                                                            ∫                      VDD                      0                                        ⁢                                                                  (                                                                              V                            DD                                                    -                                                      V                            out                                                                          )                                            ⁢                                                                                          ⁢                                              ⅆ                                                  (                                                                                    V                              DD                                                        -                                                          V                              out                                                                                )                                                                                                                                                                                            =                                ⁢                                                                                                    C                        L                                            ⁢                                              V                        DD                        2                                                                                    2                      ⁢                                                                                          ⁢                      tp                                                        +                                                                                    C                        L                                            ⁢                                              V                        DD                        2                                                                                    2                      ⁢                                                                                          ⁢                      tp                                                                                                                                              =                                ⁢                                                                                                    C                        L                                            ⁢                                              V                        DD                        2                                                                                    2                      ⁢                                                                                          ⁢                      tp                                                        =                                                            C                      L                                        ⁢                                          V                      DD                      2                                        ⁢                    f                                                                                      ⁢                                  ⁢                              where            ⁢                                                  ⁢            tp                    =                      1            ⁢                          /                        ⁢                          f              .                                                          (        2        )            
For n elements (n output lines), the equation (2) is multiplied by n and the capacitance load CL of each output is given a common value c to give the equation (1).
For example, if the data band width (transfer efficiency) is doubled by improving the operating frequency f, the power is also increased. In a memory cell array, it is desired not only to increase the data amount but also to reduce power consumption.
In Patent Document 1, there is disclosed a memory system that supports multiple memory access latency time. FIG. 1 herein shows the configuration of the system disclosed in Patent Document 1 (FIG. 1 is cited from FIG. 2A of Patent Document 1). The configuration of FIG. 1 controls the access to memory devices in the memory system. The memory devices are classed into a group near to a memory controller 202 (latency time group 1) and another group remote from it (latency time group 2). The global access latency is reduced by assigning data frequently accessed data to the group 1 and assigning other data to the group 2.
FIG. 2 illustrates a memory configuration in the case that the configuration of FIG. 1 has been replaced by a state-of-the-art DRAM (FIG. 2 illustrates a reference case (prototype example) prepared by the present inventor).
Referring to FIG. 2, the memory (DRAM core) includes:
a memory cell array 1 which has a multiple-bank configuration and is composed of an array of a plurality of memory cells,
a row decoder (X DEC) 2 that decodes a row address to activate a selected word line,
a column decoder (Y DEC) 3 that decodes a column address to turn on a Y-switch of a selected column (bit line),
a sense amplifier/Y switch 4 that amplifies the potential on the bit line,
a data amplifier/write amplifier (WRITE AMP) 5 that amplifies read data amplified by the sense amplifier of the selected column to output the so amplified data to RWBS (read write bus) to drive write data from the RWBS (read/write bus),
a control circuit (Address Command Timing Controller) 6 that controls the address, command and the timing,
a data control circuit (Data, I/O and Data Mask) 7 that controls the data input/output function to or from a memory cell between a data terminal (not shown) connected to an internal data bus 9 and the RWBS (read write bus) and that manages write mask control to the memory cell by a data mask signal from a data mask terminal (not shown),an internal data bus 9 that performs an input (clock, address or command input) 8 to the DRAM core, and inputs/outputs data to or from the DRAM core.
FIG. 3 illustrates a portion of a prototype arrangement (layout) of FIG. 2. FIG. 3 is also prepared by the present inventor to explain FIG. 2. Referring to FIG. 3, an area 10 in the memory cell array 1 represents an active area including memory cells being accessed. The reference numeral 11 denotes a memory array or a memory macro (a circuit block used in e.g., a system LSI) that constitutes a memory array basic unit. A memory array basic unit may be abbreviated to a basic unit. The control circuit (address command timing controller) 6 manages control via an address/command bus (ADDRESS/COM BUS) connecting in common to basic units 11 of two memory cell arrays 11 to select the active area 10 to be accessed. The active area 10 is selected by an X decoder (XDEC) 2 that decodes an X-address (row address) of the address signal to activate the selected word line and by a column decoder (YDEC) 3 that decodes a column address to turn on a Y-switch of the selected column. Data (WRITE data and READ data) are inputted/outputted at the data control circuit ((data I/O data mask) 7 and transferred via a read/write bus (RWBS) connecting common to the multiple memory array basic units 11. In FIG. 3, there are 36 data terminals (DQ terminals) connected to the internal data bus 9 that compose a data input to the DRAM core only by way of illustration. A plurality of items of bit data at the DQ terminals 9 are converted by e.g., the data control circuit 7 into parallel data which are then transferred in parallel to the read/write bus (RWBS). It is noted that the plurality of items of bit data are serially inputted/outputted bits corresponding to a burst length (the number of bit data that are able to be inputted/outputted in succession). This read/write bus (RWBS) is extended astride the multiple memory array basic units 11 and connected common to the data amplifier (Data AMP)/write amplifier (WRITE AMP) of each memory array basic unit 11. With the burst length equal to 4, the read/write bus (RWBS) includes four data lines (I/O lines) per data terminal. Hence, with the 36 data terminals, there are provided 36×4=144 data lines (IO lines).
The IO configuration in the memory cell array is a hierarchical configuration (local IO line/main IO line) or a non-hierarchical configuration. In case the IO configuration is hierarchical, the main IO line connected to the data amplifier/write amplifier (WRITE Amp) is connected via a switch circuit, not shown, to a plurality of local IO lines. These local IO lines are selected by the column decoder (Y DEC) 3 and connected to a bit line of the column selected via the Y switch 4 set in an on state.
In READ operation, data read from a memory cell connected to a word line selected by the Y-decoder 2 (set at High potential) is amplified by the sense amplifier 4. The data is then transferred, via Y switch 4 of the selected column, set in an on state, to the local amplifier, and thence to the data amplifier (Data Amp) 5, and output at the read write bus RWBS. The data control circuit 7 converts the parallel bit data (data composed of a number of bits corresponding to the burst length) into serial data which are then serially output at the data terminal to an internal data bus 9 synchronized with a clock signal. Note that, in the DDR, the serial data are transferred in synchronization with rising and falling edges of the clock signal.
In WRITE operation, the bit data, serially delivered at the data terminal connected to the internal data bus 9, is converted into parallel data by the data control circuit 7 so as to be transferred on the RWBS. The bit data is amplified by the write amplifier (WRITE AMP) 5 and transferred via main IO line, IO line and the selected local IO line to the bit line of the selected column whose Y switch 4 has been turned on.
The data is controlled by the address command timing controller 6 and read (READ) or written (WRITE) in the active area 10 in the selected memory cell array 1.
FIG. 4 which is prepared by the present inventor, illustrates a case 1 in which in FIG. 3, a remote active area (active area 10-1) looking from the side of the address command timing controller 6 and the data IO 7, is selected, and a case 2 in which a near active area (active area 10-2) looking from the same side in FIG. 3 is selected.
FIG. 5, which is prepared by the present inventor, is a timing chart illustrating an access operation for each of cases 1 and 2 in FIG. 4. FIG. 5 schematically illustrates the relationship among a command (CMD), a clock signal (memory CLK), control delays (10-1 control delay and 10-2 control delay), time of selection of the active areas 10-1, 10-2 (10-1 selection time and 10-2 selection time) and output delays for the active areas 10-1 and 10-2 (10-1 output delay and 10-2 output delay), and α, β and θ. It is noted that the control delays (10-1 control delay and 10-2 control delay) are those for the active areas 10-1, 10-2 from the command input for the cases 1 and 2.
α is Row Cycle Time (tRC),
β is Row to Row Delay (tRRD),
γ is control delay or data delay (output delay), and
θ is READ Latency (latency).
γ includes time for the address command timing control circuit 6 (address command timing controller) and the data control circuit 7 to control the active area 10 of the memory cell array and delay time caused in transferring a data signal via read write data bus RWBS to the memory array basic unit. The output delay corresponds to time for data read from the active area 10 to be transferred via RWBS to the data control circuit 7.
α is a cycle relating to the memory cell array operation of the active cell area 10.
β is a time that elapses since the input of a command (CMD) until the input of the next command is enabled.
θ is the number of clock cycles since the READ command is inputted until data is outputted at the data terminal DQ.
In an example of FIG. 5, it holds that
10-1 control delay>10-2 control delay, and
10-1 output delay>10-2 output delay.
The control delay as well as output delay γ in the active areas 10-1 and 10-2 is one clock cycle at the maximum, while tRC(α) is 6 cycles, such that α>>γ, that is, α is appreciably longer than γ. On the other hand α˜θ, that is, α is about equal to the latency.
Note that increasing the data band width and improving the memory cycle are synonymous with improving the latency.
In the example of FIG. 5, the ratio of γ to a (time ratio: γ/α) is small. Hence, the delay of γ (control delay and output delay) as well as the power consumed in γ (control delay and output delay) is small as compared to the delay as well as the power in α.
However, if the number of parallel connections of IO in the memory cell array, for example, the number of data lines for parallel transfer of the read write bus, is increased, the ratio of γ to a will increase due to increase in time for parallel conversion of bit data serially inputted from the data terminal. This leads to increased power consumed in γ.
So far, the development in one aspect of the semiconductor memory has been centered on the architecture for reducing tRC(α) and β. Note that α=tRC (row cycle time) is an index for the cycle in which the memory cell array is actually in operation in accessing the memory cell. The memory input/output operating frequency f is determined by the number of data that is read out/written in parallel in one tRC (number of memory cells accessed).
FIG. 6, which is prepared by the present inventor for clarifying problematic points, illustrates a prototype example 1 (reference case). In FIG. 6, the number of data terminals (data terminals connected to the internal data bus 9) is 36, with the burst length BL being 4. In correspondence with BL=4, the read write bus (RWBS) is 4 bits. In correspondence with the 36 data terminals, there are 36×4=144 parallel data lines (IO lines), such that 144 data are written in or read from the active area. YDEC is a column decoder that decodes the column address of the address signal. It is noted that, in FIG. 6, those elements that are the same as or equivalent to those shown in FIGS. 3 and 4 are depicted by the same reference numerals. The YDEC may, of course, be provided within the memory array basic unit, as shown in FIGS. 3 and 4.    [Patent Document 1] JP Patent Kohyo Publication No. JP-P2008-500668A