Data storage Flash memories generally have uniform and small block sizes and often provide additional memory space for error correction codes (ECCs). For example, a NAND-type Flash memory typically provides 512 bytes of ECC memory (not useable for data) in every block containing 16 K bytes of data storage. These memories often have minimum pin counts with either serial or multiplexed interfaces and are often used in small Flash memory cards for portable applications. For long-term reliability, such memories typically provide real-time sector-mapping features similar to those found in Hard Disk Drives (HDDs). For example, NAND-type Flash memories, used in Smart Media Cards, specify that a small number of blocks may be invalid, due to one or more defective bits. The system design must be able to mask out the invalid blocks via address mapping. Such memories are sometimes called Mostly Good Memories (MGM) and typically guarantee a minimum of about 98% good bits.
Compact Flash (CF) and Multi Media Cards (MMCs) typically use NOR-type Flash memories that provide features to improve the endurance. For example, the memory may have an intelligent erase algorithm with verification to avoid over-stressing the oxide and improve program/erase (P/E) cycle endurance. Sector “tagging” (or sector hot counts) in such memories keep track of P/E cycling history so that the erase voltage can be set for each sector individually, according to the sector's P/E cycling history. In additional, such Flash memories generally provide real-time replacement of a sector after the sector has reached a pre-set lifetime maximum number of P/E cycles.
Flash memories for combined parameter, code, and data storage typically have asymmetric block sizes that are optimized to store different types of information. In general, these memories lack extra memory space for ECC and real-time sector-mapping capabilities but do provide parallel operations (POs) such as read-while-write (RWW) or read-while-erase (RWE), which allow a processor or another external device to write to or erase one part of the memory chip while reading from another part of the same memory chip. This capability increases system performance, and reduces overall system cost by eliminating additional memory chips such as system SRAM for program updates or EEPROM to store system parameters.
A 2001 ISSCC paper 2.3 titled “1.8v 64 Mbit 100 MHz Flexible Read While Write Flash Memory” (ISSCC Article), which is incorporated by reference in its entirety, describes a Flash memory device with flexible RWW capabilities based on a “Multiple Partition” architecture, which allows parallel operations for code and data. Intel's 1.8v Wireless Flash Memory Datasheet (28F640W18), which is also incorporated by reference in its entirety, further describes the memory described in the ISSCC Article.
FIG. 1 illustrates the layout of a 64-megabit device 100 shown in a die micrograph in the ISSCC paper. Memory device 100 has sixteen array planes 110-0 to 110-15, generically referred to as array planes 110. Each array plane 110 has the same storage capacity and specifically contains four megabits of storage. With the hardware partitioning of memory 100 into array planes 110, the user of memory 100 can initiate a write or erase operation in any one of the sixteen array planes 110 while simultaneously reading in any of the other fifteen array planes 110. In memory 100, the selected word and bit lines in the array plane selected for the write or erase operation are biased at the program or erase voltages; while the selected word and bit lines in the array plane selected for the read operation are biased at read voltages.
The erase, write, and read status of each array plane 110 is stored in an on-chip state-machine (not shown), and the user changes the erase, write, and read status of an array plane 110 by issuing commands to Flash memory 100. Memory 100 generally allows reading in one array plane while writing or erasing in another array plane. Table 1 shows some of the possible parallel operation scenarios for memory 100.
TABLE 1First operation inAllowed Parallel Operation in Another Array Planeone array plane:ReadProgramEraseIdleXXXReadXXProgramXEraseX
Memory 100 permits allocation of array planes 110 for specific purposes. An example allocation of the memory space uses four array planes 110 for an operating system (or Real Time OS), two array planes 110 for a boot sector, and the remaining ten array planes 110 for file management. With this allocation and the RWW capability, a CPU can simultaneously read code of the Real Time OS while writing or erasing data in the file management sections. The RWW architecture, in which a user can seamlessly access data across various array plane or partition boundaries, increases the overall system performance.
As shown in FIG. 1, one of array plane 110-0 is adapted for parameter storage, while fifteen array planes 110-1 to 110-15 are intended for main storage. More specifically, each of the 4-megabit array planes 110-1 to 110-15 contains eight 32-Kword “main” blocks, while the 4-megabit parameter array plane 100-0 contains eight 4-kword “parameters” blocks and seven 32-Kword main blocks. Each 32-Kword or 4-Kword block is independently erasable as a block.
The bulk of the storage capacity of memory 100 is in the main blocks and can store code or data. The parameter blocks in array plane 110-0 are smaller for more efficient storage of parameters because parameters generally come in smaller units and are more frequently updated. For example, in a MP3 player, data representing music comes in relatively large units that are efficiently stored in the main blocks, and control parameters such as directory information requires less storage but is more frequently changed. More conventional system using a data storage Flash memory would normally store parameters in a separate EEPROM to improve storage efficiency and allow access to parameters while accessing data. However, software techniques allow Flash memory 100 to emulate the word-rewrite functionality of EEPROMs. As a result, the asymmetrically blocked architecture enables code, parameters, and data integration within a single memory device.
Flash memories with similar parallel operation capabilities and asymmetric block architectures are described in the datasheet for the Simultaneous Operation Flash Memory (Am29DL323C) available from Advanced Micro Devices, Inc. and the datasheet for the Concurrent Flash (AT49BV1604 & AT49BV1614) available from Atmel, Inc. These datasheets are hereby incorporated by reference in their entirety.
The 32 Meg bit device of AMD is divided into two banks, with bank 1 containing 8 megabits and bank 2 containing 24 Meg bits. Bank 1 is further segmented into fifteen 32-Kword blocks and eight 4-Kword blocks, while bank 2 is segmented into forty-eight 32-K word blocks. In actual application, the user can structure bank 1 to store data and boot code, and bank 2 to store control code. The command sequence that tells bank 1 to program or erase data blocks resides as executable codes in bank 2. While bank 1 is being programmed or erased, the system can continue to execute code from bank 2 to manage other system operations. Depending on system implementation, the CPU can also execute code from bank 1, and program or erase any of the blocks in Bank 2.
A 16-megabit memory device from Atmel, Inc. has a bank containing 12 megabits and a bank containing 4 megabits and allows a read operation in one bank while the other bank performs a write operation. Furthermore, the device has 40 blocks including thirty 32-Kword main blocks, eight 4-Kword parameter blocks, and two 16-Kword boot blocks.
One of the disadvantages of asymmetric block architectures is the inability to layout the arrays with symmetry and balance. For example, the 64-megabit memory of FIG. 1 has a layout in which array plane 110-0, which contains the parameter blocks, requires more integrated circuit area than do each of array planes 110-1 to 110-15. The parameter blocks contain less storage (i.e., fewer memory cells) than the main blocks do, and the parameter blocks require proportionally more overhead because of the need for a block select transistor per block. Block select transistors connect the local bit lines (within a block) to the global bit lines (across all blocks in the same array plane). For stacked-gate NOR Flash with negative-gate-channel-erase, additional overhead associated with the independent P-well inside a separate Deep N-well, is required for each block. Since the width of array planes 110-1 to 110-8 on the left side of memory 100 is less than the required width of array plane 110-0, part of array plane 110-0 is on the right side of memory 100 with array planes 110-9 to 110-15. Peripheral circuitry 120 is around the blocks of array plane 110-0 that are on the right side of memory 100. Additionally, Flash memory 100 has nine array planes 110-0 to 110-8 on the left side and only seven array planes 110-9 to 110-15 on the right side.
Memories with asymmetric block architectures and array layouts such as illustrated in FIG. 1 have significant drawbacks. In particular, since block sizes are non-uniform and hardwired, these memories cannot provide complete flexibility in array partitioning. Only specific array planes are adapted for storage of parameter data. Therefore, these memories are unable to support all applications optimally and efficiently. For example, if an application's boot information occupies 16-K words (or four 4-Kword parameter blocks), memory 100 will have four 4-Kword parameter blocks and seven 32-Kword main blocks remaining in parameter array plane 110-0. Then, if this particular application requires a total of twelve individual 4-Kword blocks to store parameters (that need frequent updates), the application must use up all of the remaining 11 blocks in parameter partition 110-0, and one of the eight main blocks in one of the fifteen array planes 110-1 to 110-15. The remaining seven main blocks in the array plane containing one block of parameters cannot be effectively used to store data because parallel operations cannot simultaneously access parameters and data from the same array plane. Accordingly, the memory space in the seven main blocks of the array plane containing one block of parameters becomes (effectively) unusable.
A memory with an asymmetric block architecture also requires more time to develop. In particular, an asymmetric block architecture would require additional time, manpower, and efforts to layout, simulate, and verify, and requires more effort and time to characterize and test the devices during prototyping and mass production
Asymmetric array layout is also undesirable because an asymmetric layout generally uses integrated circuit area inefficiently, which results in a larger die size and greater manufacturing costs. The asymmetric layout of FIG. 1, for example, requires longer global I/O lines running vertically between the left and right portions of array plane 110-0 and requires additional column related circuitry such as sense amplifiers, column decoders, and column pass devices, row decoders, and drivers. The longer global I/O lines affect die size and performance.
Asymmetric layouts also suffer from: non-uniform power and signal bussing, which will cause memory cells to exhibit different characteristics and performance across the array, e.g., the parameter blocks on the right may be more or less subject to noise from the periphery circuits.
In addition to problems with asymmetric layout, variation in the sizes of blocks has disadvantages. In particular, differences in memory cell characteristics or performance can arise from the differences in the sizes of p-wells. Having different block sizes that use a negative gate erase process generally cause p-well sizes to vary since the p-well sizes are proportional to the block sizes. The substrate resistance can vary with the size of the p-wells and cause differences in the characteristics of memory cells in different blocks.
Redundancy implementation in an asymmetric block architecture is also more complex. In conventional Flash memory redundancy and repair schemes, a defective memory element (either a word line or a bit line) is identified during testing, disabled, and replaced by a spare memory element. As a result, whenever an incoming address matches the defective memory element's address, a redundancy circuit causes selection of the spare memory element instead of the defective memory element. Providing both word line-based and bit line-based redundancy provides the small granularity for defect replacement, but the circuit implementation can be very complex (requiring substantial complication of the decoders), requires substantial layout overhead, and adversely affects speed because of additional circuitry required in the decoders. Partition-Level redundancy represents the largest granularity, and is not practical to implement. Block-Level redundancy offers a compromise between partition-level redundancy and bit line or word line level redundancy, but block-level redundancy is not practical for a memory having asymmetric block sizes.