In recent years, along with high functionality, high speed operation, and large storage capacity of a semiconductor memory device, such as DRAM (Dynamic Random Access Memory) and so forth, DDR (Double Data Rate), DDR2, and DDR3 architecture have been introduced to improve significantly data bandwidth of memory input and output.
In order to improve the data bandwidth of memory input and output, it is necessary to increase the amount of data that can be handled, by speeding up memory Read and Write cycles (tRC), by improving the number of parallel operations executed simultaneously in a memory (increasing the number of parallel data lines (I/O lines), or by increasing the number of memory array banks.
As is well known, power P is approximated by Expression (1).n×c×f×V2  (1)
In Expression (1), n is the number of elements, c is a capacitance (output load capacitance charged or discharged by an element), f is an operating frequency, and V is an operating voltage. Here, the derivation of Expression (1) is outlined. The power P is an average of power consumed when an element charges or discharges the output load capacitance (dynamic dissipation). Assuming that the operating frequency (toggle frequency) is f and the output load capacitance is CL, the power is given by summing the power when element output Vout rises from Low (0V) to High (VDD) and the power when the output Vout drops from High (VDD) to Low (0V), and is approximated as follows (note that tp=1/f).
                              Pd          =                                                                                          C                    L                                    tp                                ⁢                                                      ∫                    0                    VDD                                    ⁢                                                            V                      out                                        ⁢                                                                                  ⁢                                          ⅆ                                              V                        out                                                                                                        +                                                                    C                    L                                    tp                                ⁢                                                      ∫                    VDD                    0                                    ⁢                                                            (                                                                        V                          DD                                                -                                                  V                          out                                                                    )                                        ⁢                                                                                  ⁢                                          ⅆ                                              (                                                                              V                            DD                                                    -                                                      V                            out                                                                          )                                                                                                                  =                                                                                                      C                      L                                        ⁢                                          V                      DD                      2                                                                            2                    ⁢                                                                                  ⁢                    tp                                                  +                                                                            C                      L                                        ⁢                                          V                      DD                      2                                                                            2                    ⁢                                                                                  ⁢                    tp                                                              =                                                                                          C                      L                                        ⁢                                          V                      DD                      2                                                        tp                                =                                                      C                    L                                    ⁢                                      V                    DD                    2                                    ⁢                  f                                                                    ⁢                                                      (        2        )            
With n elements, Expression (2) is multiplied by n, and with load capacitance CL of each element having a common value c, Expression (1) is obtained.
For example, in a case where the data bandwidth (transmission efficiency) is doubled by improving the operating frequency f, the power also increases. With regard to a memory cell array, lower power consumption is desired at the same time as improvement data amount.
It is to be noted that Patent Document 1 discloses a memory system that supports multiple memory access latency time. FIG. 1 shows a configuration of a system disclosed in Patent Document 1 (cited from FIG. 2A of Patent Document 1). The configuration controls access to a memory device in the memory system. A division is made into a memory device group (latency time group 1) near a memory controller 202 and a memory device group (latency time group 2) distant from the memory controller 202. By sorting data that is frequently accessed and data that is not frequently accessed into group 1 and group 2, respectively, overall access latency is reduced.
FIG. 2 is a diagram representing a general memory configuration in a case where the configuration of FIG. 1 is replaced by a general DRAM (FIG. 2 is a diagram of a reference case made by the inventor of the present application).
As shown in FIG. 2, the memory (DRAM core) includes a memory cell array 1 having a plurality of memory cells in an array form, a row decoder (X DEC) 2 that decodes a row address and activates a selected word line, a column decoder (Y DEC) 3 that decodes a column address and switches on a Y switch of a selected column (bit line), a sense amplifier/Y-switch 4 that amplifies the potential of a bit line, a data amplifier/write amplifier (Data Amp/Write Amp) 5 that amplifies read data amplified by a sense amplifier of a selected column, outputs amplified read data to an RWBS (read/write bus), and drives write data from the RWBS (read/write bus), an address command timing controller 6 that controls address, command, and timing, an input output function and data mask (Data I/O, Data Mask) 7 where the input output function inputs data to a memory cell and outputs data from a memory cell, between a data terminal (not illustrated in the drawings) connected to an internal data bus 9, which is an input to a DRAM core, and a read/write bus RWBS, and the Data Mask performs control of a write mask to a memory cell by a data mask signal from a data mask terminal (not illustrated), input (clock, address, command) 8 to the DRAM core; and the Internal Data Bus 9 that performs input of data to, and output of data from, the DRAM core.
FIG. 3 is a diagram for describing FIG. 2, and is a diagram showing an example of an arrangement (layout) of FIG. 2 (FIG. 3 is a diagram made by the inventors of the present application). In FIG. 3, an area 10 in the memory cell array 1 represents an active area including a memory cell that is to be accessed. Reference number 11 indicates a memory cell array or a memory macro (a circuit block used in a system LSI or the like), forming a basic unit. By controlling the basic unit 11 of the memory cell array by an ADDRESS/CMD BUS connected in common to basic units 11 of the memory cell array, the address command timing controller 6 selects the active area 10 that is to be accessed. Data (Write data/Read data) are input and output from the data I/O unit (Data I/O) 7, and are transferred by a read/write bus (RWBS) that is a bidirectional data bus in common connected to the plural memory cell array basic units 11. Although not limited thereto, in FIG. 3, there are 36 data terminals (DQ terminals) connected to the Internal Data Bus 9 forming data input of the DRAM core. Plural hit data (4 hits corresponding to a burst length) of respective data terminals are converted to parallel data in the data IO unit (Data I/O) 7, and are transferred to the read/write bus (RWBS). The read/write bus (RWBS) extends over the plural memory cell array basic units 11, and is connected in common to a data amplifier (Data Amp)/write amplifier (Write Amp) of each of the memory cell array basic units 11.
As an IO configuration in an array, there is adopted a hierarchical configuration (Local IO/Main IO), or a nonhierarchical configuration. In a case of the hierarchical configuration, the Main that is connected to a data amplifier/write amplifier (Data Amp/Write Amp) 4 is connected to a plurality of Local IOs via a switch circuit, which is not illustrated, and each Local IO is selected by the column decoder (Y DEC) 3 and is connected to a bit line of a selected column via a Y switch 5 that is in an ON state.
In a read operation, data read from a memory cell having a word line set to High is amplified by a sense amplifier 5, transmitted to a Local IO line via the Y switch 5, which is in an ON state, and furthermore is transmitted to the data amplifier (Data Amp) 4 via a Main IO line, and output to an RWBS. In the data IO unit 7, parallel data (data corresponding to burst length) is converted into serial data, and is output from a data terminal to the Internal Data Bus 9 in synchronization with a clock. In the DDR configuration, data is transferred in synchronization with rising and falling edges of a clock signal.
In a write operation, bit data serially supplied from a data terminal connected to the Internal Data Bus 9 is made parallel in the data IO unit 7, transferred to the RWBS, amplified by the write amplifier (Write Amp) 4, and transmitted to a bit line of a selected column with the Y switch 50N, via the Main IO line and selected Local IO line.
Data is controlled by the address command timing controller 6, and is read/written the active area 10 in the selected memory cell array 1.
FIG. 4 is a diagram showing a case 1 (active area 10-1) with a distant side selected, and a case 2 (active area 10-2) with a near side selected, as seen from the address command timing controller 6 and the data IO 7 in FIG. 3.
FIG. 5 is a timing chart (the diagram was made by the inventors of the present application) showing access operation in each of case 1 and case 2 of FIG. 4. For a command (CMD) and a clock (memory CLK), in case 1 and case 2, FIG. 5 schematically shows relationships of α, θ, β, and control delay corresponding to the active areas 10-1 and 10-2 (10-1 control delay, 10-2 control delay) from command input, selection time of the active areas 10-1 and 10-2 (10-1 selection time, 10-2 selection time), and output delay corresponding to the active areas 10-1 and 10-2 (10-1 output delay, 10-2 output delay).    α is tRC (Row Cycle time),    β is tRRD (Row to Row Delay),    γ is control delay/data delay (output delay), and    θ is read latency.
The γ includes setting time (control delay) of address/command and data for controlling an active area 10 of the memory cell array by the address command timing controller 6 and the data IO 7, and delay time for transferring data to a memory cell basic unit via the RWBS.
α is cycle time related to a memory cell array operation of an active area.
β is time from input of one command (CMD) to when a subsequent command (CMD) can be input.
θ is the number of clocks (latency) from input of a Read command until data is output to a data terminal.
As shown in FIG. 5, α>>γ, that is, α is much longer than γ.
Furthermore, in α to θ, α has a time period approximately is equivalent to latency.
Increasing data bandwidth and improving memory cycle are synonymous with improving the latency θ.
In the example of FIG. 5, the ratio (time ratio) of γ to α is small. Therefore, delay of γ (control delay/output delay) and power consumed in γ (control delay/output delay) are small in comparison to the delay and power of α. However, if the number of parallel IO lines in the memory cell array increases, the ration of γ to α becomes large and the power consumed by γ becomes large, due to increase in time and the like for parallel conversion of bit data serially input from the data terminal, for example.
Heretofore, the development of a memory architecture has been made to reduce α and β.
α=tRC (row cycle time) is an index indicating a cycle time during which the memory array actually operates to access a memory cell.
The operating frequency f of memory input and output is determined according to the number of data for which Read/Write is performed (the number of memory cells to be accessed), in a period of one tRC.    [Patent Document 1]    JP Patent Kohyo Publication No. JP2008-500668A