The present invention relates to high performance semiconductor-memory devices, and more particularly to memory devices having multiple level architecture.
Memory devices and logic circuits are two major types of circuit components used in integrated circuits (IC). As IC manufacture technologies progress, both the density and the performance of logic circuits have been improved exponentially. Current art logic circuits are operating at multiple GHZ (billion cycles per second), while each chip can have more than 100 million gates. The density of IC memory devices is also improved exponentially. Current art SRAM (static random access memory) can have 64 M bits per chip, while DRAM (dynamic random access memory) can have 256 M bits per chip. However, the performance of memory devices has been improved in much slower rate than that of logic circuits. Current art SRAM is operating at 300 MHZ (million cycles per second), while DRAM access time stay around 15-60 ns (nano-second) for many generations. This performance gap between logic and memory circuits created a bottleneck in IC operation. The logic circuits are not able to operate at optimum speed because the supporting memory devices can not provide data and instructions fast enough. To make the matter worse, this performance gap is getting larger and larger as IC technology progresses. Memory bandwidth problem has been the limiting factor for most of the IC products, and the situation is getting worse.
The root cause for the performance problem of current art memory devices can be understood by examining their data access methods. FIG. 1 illustrates the basic structure of a memory device (101). This memory device contains mxc3x97n memory cells (103) connected by n horizontal word lines (WL1, WL2, . . . , WLj, . . . , WLn) and m vertical bit lines (BL1, BL2, . . . , BLi, . . . BLm), where m and n are integers. Each bit line is connected to one sensing circuit (S1, S2, . . . , Si, . . . , Sm) for detecting the data stored in the memory cells. For many memory devices, each memory cell may have two or more bit lines, while the sensing circuits may need more than one input lines. In FIG. 1, each bit line is represented by a single line in FIG. 1 for simplicity. To access the data in this memory device, one of the horizontal word line (WLj) is activated by one decoder driver (105) in the word line address decoder (107). A row of the memory cells connected to the activated word line (WLj) place data signals into vertical bit lines (BL1, BL2, . . . , BLi, . . . BLm) according to their storage data. The sensing circuits (S1, S2, . . . , Si, . . . , Sm) determines the content of those activated memory cells, and provide outputs to other devices. The word line driver (107) need to drive m devices on the word line (WLj). Each bit line (BLi) is connected to n memory cells. When the memory array is very large (for example, m=n=4 K for a 16 M device) the loading on word lines and bit lines are so large that it is very difficult to achieve high performance. Power consumption is another major problem. For each memory operation, one word line (WLj) and all the bit lines (BL1-BLm) are activated so that a large amount of power is consumed. For each new generation of IC technology, the driving capability of the word line driver (107) is typically improved by 30%, and the dimension of memory cell is typically reduced by 30% on each side, which are favorite factors for speed improvement. However, the requirement on the number of cells (mxc3x97n) are typically increased by 2 times in each side for each new generation. For each new generation of IC technology, the loading driven by each gate of memory device is reducing much less than the loading driven by each gate of logic circuits, while the driving capability of each gate is improving in similar rates for both memory and logic circuits, making it very difficult to improve memory performance in the same rate as logic circuits.
A few current art methods have been implemented to reduce the memory performance problem. One popular method is to arrange memory devices in multiple bank architecture as illustrated in FIG. 2(a). In this example, the memory device in FIG. 1 is divided into 4 independent banks. Each memory bank has a smaller memory array (201) that has m/2xc3x97n/2 memory cells. Each memory bank has its own sensing circuits (203) that sense m/2 bit lines, its own address decoder (205) that drives n/2 word lines, and its own controller (207) to control its activities. The individual operation within each bank should be faster than the large memory in FIG. 1 due to smaller dimension. However, the same data and control signals (209) need to go to all the banks, so that we will need a long routing channel (211) connecting all the banks. Operations required to control this routing channel (211) introduce additional delay. We can further divide the memory device into more banks (e.g. 16 banks) to make the operation in each individual bank faster, but that will require a much more complex routing channel with more delays caused by the routing channel. Due to this limitation, the multiple bank architecture usually achieves limited improvement in performance. Meanwhile, multiple bank architecture always introduces significant cost penalty because each bank needs to have its own peripheral circuits.
Another popular method is to use multiple level sensing architecture as illustrated in FIG. 2(b). In this example, the memory device in FIG. 1 is divided into 4 memory blocks (221). Each memory block has an mxc3x97n/4 memory array, and m first level sensing circuits (US1, US2, . . . , USi, . . . Usm). The outputs of these first level sensing circuits can be placed into second level bit lines (KBL1, . . . KBLi, . . . KBLm) through switches controlled by second level word lines (KWL1-KWL4). The second level bit lines are connected to the second level sensing circuits (KS1, . . . KSi, . . . KSm). This method improves first level sensing speed by reducing the first level bit line dimension, but second level sensing will cause additional delay. The area penalty is usually significant due to additional number of sensing circuits. There is no improvement in word line loading. To achieve performance improvement, the timing improvement in the first level sensing must be larger than the added delay in the second level sensing. In order to achieve that purpose, the driving capability of the first level sensing output need to be much stronger than that of memory cells. It is very difficult to increase the driving power of first level sensing because of tight pitch layout problem. Prior art first level sensing circuit need to follow the narrow pitch defined by memory cells, which is typically so small that any increase in driving capability will require significant area penalty. In reality, the multiple level sensing method in FIG. 2(b) achieves limited performance improvement due to the limitation form tight pitch layout induced area penalty. One method to reduce the tight pitch layout problem is to use a select switch before the first level sensing circuit as shown in FIG. 2(c). This method is usually called xe2x80x9cY selectxe2x80x9d method in the IC industry because it requires a decoder at a boundary vertical to the word line decoders. In this example, 4 nearby bit lines (BL1-BL4) are connected to 4 switches (S1-S4) that are controlled by 4 Y select signals (YS1-YS4). The common output (SBL) of those 4 switches are connected to the input of a sensor (SA). For each operation, one and only one of the 4 switches is activated, and the sensor (SA) will sense the data on the selected bit line. Using this Y select switch, we will need only 1 sensor for every 4 bit lines. Therefore, there are 4 times more area available to layout the sensor. This method does not work for DRAM because the memory cells (241) connected to unused bit lines will loose its storage data. Therefore, Y select method can not be used for DRAM first level sensing. The Y select method works for SRAM, but the Y select switches occupies significant area, especially when we try to increase the number of bit lines connected to each sensing circuit. There is also significant waste in power because all the power used to drive the unused bit lines are wasted.
A current art memory device typically uses all of the above methods. A typical DRAM usually contains 4 banks, each memory bank has two levels of sensing, while the second level sensing uses Y select. However, the above methods achieve limited performance improvement due to limitations discussed in above sections. With the helps of all of the above methods, the performance gap between logic and memory IC is still getting wider and wider. It is therefore highly desirable to provide novel methods to further improve the performance of memory devices. It is also highly desirable to avoid the area and power penalty introduced by prior art methods.
Besides area and power penalties, another important penalty introduced by current art memory design is noise sensitivity. Because the bit line loading is typically very large, current art memory devices use small signal sense amplifiers as sensing circuit. The small signal sense amplifiers are able to determine the output data while the signals on a bit line pair are not fully developed. This capability improves performance significantly because we do not need to wait for fully developed signals. However, the small signal sensing and its associated control mechanism must be fully isolated from any noise sources. Therefore, a current art memory device must be carefully isolated from other type of circuits. FIG. 3 illustrates the floor plan of a typical current art IC that contains embedded memory and logic circuits. In this example, the IC contains one large memory module (301), one smaller memory module (309), random logic circuits (303), routing channels (305), and a register file (307). Current art memory modules can be easily recognized by its regular structures. All the circuits, including associated data and control signals, in the memory module must be carefully isolated from other types of modules. The logic circuits (303), which can be recognized by its random wire connections, must be arranged away from memory modules (301, 309) for noise consideration. Therefore, memory devices become communication barriers in the floor plan. Typically, we need large routine channels (305) for communication between those modules. Routing channels usually can not go through memory modules for noise consideration. Routing channels going through memory modules is possible only for high level metal layers after the memory modules already shielded by low level metals. Waste in area, power, and degradation in performance often caused by the fact that the communication barrier caused by memory modules. It is therefore highly desirable to reduce the noise sensitivity of memory devices for embedded applications, so that memory modules will no longer be communication barriers.
The primary objective of this invention is, therefore, to improve the performance of semiconductor memory device. Another objective is to achieve performance improvement without significant penalties in area, power, and complexity. Another primary objective is to reduce noise sensitivity of memory devices for better floor planning of embedded IC products.
These and other objects are accomplished by a semiconductor memory device according to the invention, which includes a novel multiple level memory architecture and a novel single-bit-line-write (SBLW) memory update mechanism.
According to the present invention as described herein, the following benefits, among others, are obtained.
(1) The performance of memory devices is improved by near one order of magnitude.
(2) Dramatic reduction in power consumption is achieved with performance improvement.
(3) Smaller memory area is also achieved due to better array efficiency.
(4) Simplification in memory design improves yield and reduces manufacture complexity.
(5) Additional area saving and performance improvement are achieved due to simplification in supporting logic circuits.