FIG. 1A shows a block diagram of a conventional semiconductor memory architecture 10 commonly used in implementing different types of memories such as volatile memories (e.g., static random access memory (SRAM), dynamic random access memory (DRAM)) and nonvolatile memories (e.g., read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable program ROM (EEPROM), Flash EPROM). Such memories, as shown in FIG. 1A, typically include an array 12 of 2N rows of cells by 2M columns of cells, where N and M represent the number of row and column addresses, respectively. A cell is selected from array 12 via row decoder 14 and column decoder 16. Row decoder 14 receives row addresses A0–AN for selecting one of the 2N rows, and simultaneously, column decoder 16 receives column addresses AN+1–AN+M for selecting one of the 2M columns. The selected cell is located at the intersection of the selected row (wordline) and column (bitline).
In a read operation, a signal representing the stored data is transferred from the selected cell to a sense amplifier in block 18 via column decoder 16. The sense amplifier amplifies the cell signal, and transfers it to an output buffer (not shown) which in turn transfers it to IO pad 19 for external use. In a write operation, programming data is externally provided on IO pad 19, and is then transferred to the selected cell via a data IO circuit in block 18 and column decoder 16. Blocks 12, 16, 18 and IO pad 19 may be repeated a number of times depending upon the desired 10 data configuration (e.g., by-16 or by-32 data).
The address access time in a read operation (and a write operation for SRAMs and DRAMs) typically consists of time delays through an address buffer (not shown), row decoder 14, memory array 12, column decoder 16, sense amplifier 18, and output buffer (not shown). Of these delays, depending on the memory density, the delay through the memory array typically represents the largest portion of the total time delay because of the RC time constant associated with the long wordlines and the high capacitance associated with the long bitlines. Thus, in a given process technology (e.g., 0.13 μm), to achieve high speed, array 12 is typically divided into two or more sub-arrays, thereby reducing the length of wordlines and/or bitlines. An example of such memory configuration is shown in FIG. 1B.
In FIG. 1B, the memory array is divided into four sub-arrays 12-1, 12-2, 12-3, and 12-4 thus reducing the length of each wordline by a factor of four. However, such division of the array requires the duplication of some of the circuit blocks interfacing with the array. For example, four sets of row decoders 14-1, 14-2, and 14-3 are needed as shown. To reduce the bitline length by one half, each sub-array 12-1 through 12-4 would need to be divided into two, with the column decoder block 16 and block 18 (which includes the sense amplifiers and data I/O circuits) being duplicated. Such duplication can result in unnecessary die size increase if not properly implemented. Further, for very high-performance (e.g., high speed, low power), high-density memories wherein a large number of array divisions is used to achieve the speed targets, there may be diminishing returns on the speed after a certain number of array divisions, and there certainly would be a large power penalty associated with every level of array division. This is due to the large amount of duplication of the array-interface circuitry which leads to highly capacitive nodes in speed-sensitive circuit paths. To quickly switch such high-capacitance nodes, large drivers are required which consume substantial dynamic power. This has substantially hindered the cost-effective development of high-speed, low-power, high-density memories for such popular memory applications as portable devices.
The conventional memory configurations of FIGS. 1A and 1B suffer from a number of other drawbacks. First, the address access time is non-uniform across the array depending on both the access path (i.e., row or column) and the physical location of the cell in the array. Typically, the row access path is slower than the column access path because of the presence of the wordline RC delay in the row access path. Also, within the row access path, the cells have different access times depending on the location of the selected cell along the row. For example, the cell located closest to the wordline driver has a faster access time than the cell located furthest from the wordline driver. These non-uniformities in address access time result in complications in both the use of memories as well as their design.
Another drawback is the inefficient use of redundancy. Commonly, redundant blocks of rows and/or columns of cells are added in the array to enable replacement of defective cells with redundant cells. However, often, due to design constraints, a redundant block of rows or columns is used to replace a row or column having only one or few defective cells, thus resulting in inefficient use of the available redundant cells.
Thus, a memory configuration which yields high speed and low power, results in a more efficient use of redundancy, enjoys a relatively uniform address access time for all memory cells, is easily scalable to higher memory densities with minimal speed and power penalties, and is memory-type independent, is desirable.