1. Field of the Invention
The present invention relates to a memory device and method of operating such a memory device, and in particular to the operation of memory devices of the type where a plurality of sub-arrays are provided to reduce the size of the bit lines within the memory device.
2. Description of the Prior Art
A typical memory device will have an array of memory cells arranged in a plurality of rows and a plurality of columns, and access circuitry will be provided in association with the memory array to enable individual memory cells within the array to be accessed for the purposes of writing data to, and/or reading data from, that memory cell.
There is an increasing demand for memory devices to be constructed which are smaller and consume less power than their predecessor designs, whilst retaining high performance. New technologies are being developed which allow a reduction in the size of the individual transistors making up each memory cell, and indeed the transistors making up the associated access control circuitry. However, as the memory cells decrease in size, the variation in behavior between individual memory cells tends to increase, and this can adversely affect predictability of operation. One particular issue that arises is that as the size of the transistors decrease, they exhibit more leakage current. Hence, considering the transistors making up each memory cell, this will cause an increase in leakage current onto the bit line or bit lines connected to those memory cells. The effect of this is that the leakage will affect the maximum length of bit line which can be supported within the memory device whilst ensuring correct operation.
One way to seek to address this problem is to partition up each column in the memory device to form a plurality of separate columns in the vertical (column) direction, thus creating a plurality of sub-arrays in the bit line direction of the memory device. Each sub-array then needs to be provided with some local access circuitry (also referred to herein as local input/output (IO) circuitry) to enable data to be read from the sub-array (and if the memory cells can be re-written, to enable data to be written to the sub-array), with the various local access circuits then being connected to global access circuitry (also referred to herein as global IO circuitry) responsible for outputting data from the memory device (and optionally for receiving write data to be written into the memory device).
FIG. 1 illustrates a memory device 100 of the above type, where the memory array is divided up into a plurality of sub-arrays and associated local IO circuitry. Each sub-array and associated local IO circuitry may be constructed using the techniques of embodiments of the present invention. As shown in FIG. 1, a plurality of sub-array columns 130 are provided. Whilst in this illustrative embodiment six sub-array columns are shown, it will be appreciated that in a typical memory device there may be significantly more sub-array columns provided. Each sub-array column 130 is divided up into a plurality of sub-arrays 110, each sub-array 110 having associated local IO circuitry 120. In the illustrative example shown in FIG. 1, each sub-array column is divided up into four sub-arrays 110 and associated local IO circuits 120, but it will be appreciated that in a typical memory device there may be significantly more than four sub-array structures provided within each sub-array column 130.
By dividing each sub-array column 130 into a plurality of sub-arrays, the length of the bit lines provided within the memory device can be significantly reduced, when compared with a memory device where each column includes only a single memory array. This is particularly beneficial when using modern memory technologies such as 45 nm technology, where the individual transistors are very small, and leakage current is hence an issue. By keeping the bit line length relatively short, it can be ensured that the leakage current from memory cell transistors does not affect the correct operation of the memory device.
The memory device 100 has a global control block 140 which is used to control the operation of the global row decoder 160 and the global IO circuits 150. For a specified memory address, the global row decoder will be arranged to identify a word line within the memory device containing the addressed memory cell, and to issue an enable signal to that word line, enabling the addressed memory cell to be read from for a read operation, or to be written to for a write operation. Meanwhile, the global IO circuitry can identify based on the address the relevant column containing the addressed memory cell, and hence issue one or more control signals to the required local IO circuitry 120 to cause a read data value to be sensed and output to the global IO circuitry in the event of a read operation, or to cause write data to be input into the relevant column during a write operation. Hence, via the global row decoder 160, the global IO circuitry 150 and the relevant local IO circuitry 120, an addressed memory cell 170 can be accessed.
Various circuitry is typically provided within the local IO circuitry, including column multiplexer circuitry for selecting a particular memory cell column within the associated sub-array, and sense amplifier circuitry for detecting the data value stored in the addressed memory cell within that selected column. If the memory cells can also be written to, the local IO circuitry will typically include write transistors to generate the required data value for storing in an addressed memory cell during a write operation.
The memory cells can take a variety of forms, for example ROM, DRAM or SRAM memory cells. Typically each memory cell stores a single bit data value, and accordingly if the data being accessed is a multi-bit data word (e.g. 32 bits, 64 bits, etc), it will be necessary to access multiple memory cells. In a typical design, column multiplexers will be provided corresponding to each bit of the data word, each column multiplexer being connected to the bit lines for a plurality of columns containing memory cells in which the associated bit of the data word can be stored. The memory array can hence be considered to be formed of a plurality of sections, one for each column multiplexer. Hence, by way of example, a memory array may have 512 word lines, a multiplexer size of four (meaning four columns are connected to each column multiplexer), and a data word size of 32 bits (meaning there are 32 column multiplexers, each column multiplexer being connected to a corresponding section of the memory array). Such a memory can hence store 2048 32-bit data words.
As shown schematically by FIGS. 2A and 2B, whilst the local IO circuitry 205 may physically be placed to one side of the sub-array 200 (as shown in FIG. 2A), it can alternatively be provided at a central location 215 between two sub-array parts 210, 220, as shown in FIG. 2B. In this latter arrangement, the sub-array is considered to consist of both the first sub-array part 210 and the second sub-array part 220.
When adopting a memory design such as that shown schematically in FIG. 1, an issue that arises is how to efficiently output the read data detected by the local IO circuitry 120 to the global IO circuitry 150. As memory devices increase in size, then the distances from the locally sensed output within the local IO circuitry to the global IO circuitry become larger, and accordingly it is important to provide a fast way to port the information from the local IO circuitry to the global IO circuitry. Whereas in the older style of memory devices where bit lines ran the full length of the memory array, and a single sense amplifier was provided at the bottom of the bit lines to sense the read data values, for memory devices arranged as shown in FIG. 1 and employing modern memory technologies such as 45 nm technology, the individual transistors making up the memory cells are no longer strong enough to drive their output over such long bit lines, and accordingly sense amplifier circuitry is typically provided locally as part of the local IO circuitry of the sub-array. A difficulty that then arises is how to route that locally sensed read data to the global IO circuitry.
One prior art technique used to seek to address this problem is shown schematically in FIG. 3. As shown in FIG. 3, each local sub-array 300, 320 has local column multiplexer/sense amplifier circuitry 310, 330 connected to the local bit lines 305, 325 running through the sub-array. The local sense amplifiers within the circuits 310, 330 amplify the detected read signal from the local bit lines, and propagate it onto the global bit lines 340, 350. To limit power consumption and gain speed, a global sense amplifier (typically with some associated latch circuitry) 360 is used to re-sense the read data value from the voltages on the global bit lines 340, 350. Hence, the output from the local sense amplifiers in this case generate a new “to-be-sensed” signal for routing to a global sense amplifier 360. Such a technique is discussed in an IBM paper entitled “A 450 ps Access-Time SRAM Macro in 45 nm SOI Featuring a Two-Stage Sensing-Scheme and Dynamic Power Management” published in ISSCC 2008/Feb. 5, 2008/2:00 PM, 978-1-4244-2010-0/08/$25.00(c)2008 IEEE, 2008 IEEE International Solid-State Circuits Conference.
Such an approach is attractive since it is modular, and accordingly can still be used as the number of sub-arrays in each column of the memory device increases. However, a significant disadvantage of the approach is that the timing issues arising from sensing and then re-sensing a data value using a sequence of two separate sense amplifiers become very complex. It is difficult to time the two sense amplifiers in a precise manner without losing timing margins at both sensing locations. For each different design of memory device, the timing of the two sense amplifiers will need to be tuned having regard to the number of sub-arrays in each sub-array column. Such an approach will lead to the loss of timing margins, an increase in power consumption and an overall decrease in the speed of operation.
Another known prior art technique is a tree-based scheme such as that shown schematically in FIG. 4. In particular, FIG. 4 shows four sub-arrays and their associated local IO circuits 400, 410, 420, 430 arranged in a sub-array column, with the sub-array 430 being located physically closest to the global IO circuitry 470. Between the various sub-arrays, combining circuitry 440, 450, 460 are provided, and these combining circuits are interconnected such that the read data output from any sub-array can be output to the global IO circuitry in a predetermined number of steps, namely three steps in the particular example shown in FIG. 4. However, the actual form of each combining circuit 440, 450, 460 will depend on where that combining circuit resides, and as more sub-arrays are added to a sub-array column, the form of the combining circuits will change. Hence, every combining stage needs its own particular configuration of logic gates to perform the required combination function, and the actual form of logic at each combining stage will depend on the number of sub-arrays in each sub-array column. Further, it can be seen that the routing paths rapidly become very complex, and the total distance involved between any particular sub-array and the global IO circuitry is in many cases more than the total height of the memory. Accordingly, this results in significant power consumption. Such an approach is useful in small memories such as cache memories, since whilst the size of the memories is small, this approach gives a relatively high speed solution, and does not exhibit the timing problems of the earlier-mentioned FIG. 3 approach. However, as the size of the memory increases, and in particular the number of sub-arrays in each sub-array column increases, then the performance rapidly decreases.
Accordingly, it would be desirable to develop an improved technique for routing the read data sensed by local access circuitry of a sub-array to the global access circuitry, and in particular to develop an approach which could be used irrespective of the number of sub-arrays in a sub-array column without exhibiting the timing issues associated with the prior art of FIG. 3, and without involving the complexities and re-design issues of the prior art of FIG. 4.