1. Field of the Invention
The present invention relates to a semiconductor memory device, and more particularly, a data receiving circuit for a NAND flash memory.
2. Description of the Related Art
In a flash memory a very high number of operations can be executed. The flash memory is required to execute one of the possible operations described in the command set by using flash memory pads. That is, for each command (e.g., read request, write request, etc.), there is a protocol describing how the pads must be driven (e.g., voltages, timing).
FIG. 1 illustrates a typical pinout 100 of a conventional flash memory device (e.g., a NAND pinout).
In the pinout 100 of the conventional flash memory device, the command interface acquires and recognizes commands by control signals (ALE, CLE), synchronization signals (CE #,WE #,RE #), and a data signal (IO) and “translates” this information in internal operations to be executed. Some of these operations determine the start of the execution (typically from a read only memory (ROM)) of an algorithm that controls a very complex sequence of signals, voltages and timings in order to ensure the success of requirements. During algorithm execution the ready/busy signal (R/B #) is low and no other algorithm operations can be executed.
An important subset of all possible commands pertains to programming of a “portion” of a matrix (e.g., an array of non-volatile memory cells). A flash memory (e.g., flash NAND) is organized in one or more planes, each plane being made of blocks (minimum erasable unit), each block being divided into pages (maximum programmable area in a plane), and each page being made of bytes (in recent years, the number of bytes in a page has increased by 512 B to 8 KB).
An address used in a flash memory device is loaded by five Address Latch Enable (ALE) cycles. The first two cycles permit loading of the “column address” that identifies the position of a byte in the page space, and the remaining three loaded bytes (referred to as “row address”) allow the page inside the matrix organization to be identified.
A basic program command is a page program that is a sequence of command setup (a command is provided by the data signal (IO) and a command latch enable signal (CLE)), five address cycles, insertion of N bytes (data to be programmed at a selected page; this portion of command may be referred to as a “data-in” operation) and a confirm command that starts the execution of a programming algorithm.
FIG. 2 illustrates a timing diagram 200 for a standard Open NAND Flash Interface (ONFI) page program command. As illustrated in FIG. 2, in the page program command, command setup is 80 h, and command confirm is 10 h. Further, C1, C2 form the column address, and R1, R2, R3 form the row address. Other NAND specifications on the market may have a different setup/confirm code but the format is the same.
In recent memory devices (e.g., flash memory devices such as a NAND memory device), page dimension has significantly increased, due to a need to enhance parallel programming. The greater the number of bytes in a page, the more time that is required for the “data-in” phase. The only parameter that can be used to limit this time is the reduction of data cycle (e.g., the time to load a single byte into a buffer or random access memory (RAM)). Therefore, the duration of the data cycle (e.g., tWC in the timing diagram 200 in FIG. 2) is required to be shorter and shorter (e.g., less than 10 nanoseconds) in recent NAND flash specifications.
Program time, as illustrated below in Equation 1, is the result of the addition of time used for “data-in” phase and time spent during algorithm execution. It should be noted that time contributions of command setup, confirm and address loading are very short and, therefore, have been ignored in Equation 1.Tprogram=Tdata_in+Talgo_exec=N*tWC+Talgo_exec  (Equation 1)
where:                Talgo_exec=time to execute algorithm from ROM after command phase;        Tdata_in =time to preload in a buffer or RAM data to be programmed;        N=number of maximum bytes can be programmed in a page;        tWC=time to load a single byte into buffer or RAM;        Talgo_exec depends on flash memory technology and its typical time is in a range from 100 to 150 μs; and        Tdata_in depends on how many bytes are storable (programmable by the same algorithm) in a page by tWC that represents how fast each data is provided to flash.        
As noted above, in recent years (e.g., last year's evolution), N has gone from 510 B to 8 KB. As a consequence, a way to limit program time explosion has been to decrease a duration of tWC and provide a very aggressive setup and hold time for the data validity window. However, this makes it more difficult to manage by design a “bug free” implementation of all kinds of program commands.
Besides, as inserted bytes are required to be processed by a particular flash logic (e.g., redundancy logic), the insertion of a pipeline architecture is inevitable and must be handled with aggressive timing. An important feature to take into account is that in flash memory specifications (e.g., NAND flash specifications), a dedicated bus for data-in does not exist. That is, the bus is the same (e.g., multiplexed inside device) for the address, commands and data. This means that the architecture must classify data as either valid data or invalid data in a short time.
The main program operations that data-in path architecture must be able to handle are 1) Page program, 2) Change Write Column (CWC), 3) Copyback Program, and 4) Multiplane (or interleaved) Page program or copyback program.
Page Program
A Page Program (e.g., as illustrated in FIG. 2) refers to the standard situation described up to now that is loading into a buffer or RAM up to N bytes to be programmed. The typical format for this operation is:                Providing a single setup command cycle with a known code on IO (CLE+IO=80 h)        Providing 5 cycles to load page address (5×ALE+IO=ADD<i>)        Providing n data cycles to store D0 . . . D(n−1) bytes within the buffer (IO=BYTE <i>)        Providing a single confirm command cycle with a known code (CLE+IO=10 h)After the confirm command cycle, the algorithm phase begins.        
Other specification requests that the architecture must manage are:                a) the starting address inside the page can be arbitrary;        b) the number of loaded bytes can be an arbitrary number K;        c) bytes are loaded to sequential address;        d) inside the data-in phase, “spurious” address cycles could be provided (and must be ignored); and        e) if data-in overcome addressable area must be ignored.Change Write Column (CWC)        
FIG. 3 illustrates a timing diagram 300 for Change Write Column (CWC) command from the ONFI specification. CWC represents a possibility of changing an address inside the page to permit loading of data within the page which is other than a sequential loading. This may be helpful because if only a few bytes are to be programmed into “far” address locations, then a sequential approach causes a high Tdata_in since it may be necessary to scan the entire addressable page space to reach the most distant address locations.
For instance, if it is required to load only two bytes, one at the first page address and the other at the last page address, and a standard page program command is used, then Tdata_in=N*tWC, where N is the whole page size and typical values cause Tdata_in of dozens of microseconds.
However, by using a CWC command in such a situation, Tdata_in=2*Tdcycle. The amount of Tdata_in this case is less than hundred nanoseconds. For this reason CWC is very important and it is another situation that data-in architecture has to manage.
The typical format for the CWC operation is:                A setup command cycle inside the data-in phase of a page program, copyback or multi plane program (CLE+IO=85 h)        two address cycles to change column address (2×ALE+10=ADD)        m data cycles to store m bytes within the buffer (DBUS=BYTE <i>)        
It should be noted that no confirm command is required for this operation. Besides, it can be repeated many times inside the same program operation.
Copy-Back Program
FIG. 4 illustrates a timing diagram 400 for a Copy-back program with CWC. This command has been introduced to decrease the time required to “copy” a page from its original page address to another page of same plane, a typical operation in a flash memory. To do this, first a read from original page occurs and page information is stored into the buffer (or RAM). Then, by using the copy-back program, a re-programming to a final new address is executed.
Due to technology shrinking, this operation can be affected by some failing bits. Thus, the operation is often achieved by a readout of the buffer and correction of failing bits by using an error correction code (ECC) algorithm. This is a typical case in which only a few bytes stored in the buffer have to be changed, which is the reason why CWC is often used inside the copy-back program.
The typical format for the copy-back program is:                Setup command cycle with a known code on IO (CLE+IO=85 h)>        five ALE cycles to load page address (5×ALE+IO=ADD<i>)        k data cycles to store k bytes within the buffer (IO=BYTE <i>)        Confirm command cycle with a known code (CLE+IO=10 h)        
Unlike the page program, after setup, the buffer does not need to be preset because the buffer stores valid information (deriving from previous “read for copy-back”).
Multiplane Page Program or Multiplane Copy-Back Program
FIG. 5 illustrates a timing diagram 500 for a multiplane (e.g., interleaved) page program or a multiplane copy-back program.
High parallelism is one of the most important features of memory devices (e.g., NAND devices) and, therefore, multiplane program requirements have also been introduced into specifications (e.g., NAND specifications). In the case of a multiplane program, an object is to minimize Tprogram by permitting an algorithm to “work” on a double page space. If the multiplane program is not supported, then Tprogram from Equation 1 is required to be doubled.
However, if a multiplane program is supported, then Tprogram is given by Equation 2:Tprogram=2*Tdata_in+Talgo_exec=2*N*tWC+Talgo_exec  (Equation 2)
Talgo_exec is about the same both in a single and multiplane page program. The multiplane program can be extended to the copy-back program by permitting two pages of different planes to be moved into two different destination pages. The format of command for the multiplane copy-back program is exactly the same as the format in the timing diagram 500 illustrated in FIG. 5, except that a different code is used on in the setup command.
The typical format for multiplane page program is:                setup command cycle with a known code on IO (CLE+IO=80 h)        five ALE cycles to load page address (5×ALE+IO=ADD<i>)        Na data cycles to store Na bytes within the left buffer (IO=BYTE <i>)        First phase confirm command cycle with a known code (CLE+IO=11 h)        Waiting for a tIPBSY before providing next phase setup command        Second plane setup command cycle with a known code on DBUS (CLE+DBUS=81 h)        5 ALE cycles to load initial page address in the right plane (5×ALE+DBUS=ADD<i>)        Nb data cycles to store Nb bytes within the buffer (IO=BYTE <i>)        Confirm command cycle within a known code (CLE+DBUS=10 h)After the last confirm command cycle, the algorithm phase begins.        
The multiplane copy-back program is similar to the multiplane page program, except that the preset is not generated after setup in the multiplane copy-back program. For both the multiplane page program and the multiplane copy-back, CWC must be supported.