1. Field of the Invention
The present invention relates to a method of storing data into flash memory in a Database Management System (DBMS)-independent manner using the page-differential.
2. Description of the Related Art
Flash memory is classified into two types—NAND type and NOR type—according to the structure of a memory cell. The former is suitable for storing data, and the latter for storing code. In the present invention, the term ‘flash memory’ means the NAND-type flash memory that is widely used in flash-based storage systems.
FIG. 1 is a diagram showing the structure of flash memory. The flash memory is composed of Nblock blocks, and each block is composed of Npage pages. A page is the smallest unit of reading or writing data. A block is the smallest unit of erasing data. Each page is composed of a data area for storing data and a spare area for storing auxiliary information such as page status, bad block identification, and error correction check (ECC).
In order to read or write data from or into flash memory, there are three types of operations: read, write and erase.                Read operation: return all bits in a specific page        Write operation: change bits selected in a specific page from 1 to 0        Erase operation: set all bits in a specific block to 1        
Operations in the flash memory are much different from those in the magnetic disk. First, all the bits in flash memory are initially set to 1. Therefore, the write operation in flash memory means selectively changing some bits in a specific page from 1 to 0. Next, the erase operation in flash memory means changing all the bits in a specific block back to 1. Each block can sustain only a limited number of erase operations, which is restricted to about 100,000. When more than 100,000 erase operations are performed, data may be unreliable.
Due to the restriction of write and erase operations, a write operation is generally preceded by an erase operation in order to overwrite a page. After all the bits in a block are changed to 1 by performing an erase operation, some bits in a page are changed to 0 by performing a write operation. Further, the erase operation is performed on a much larger unit than the write operation. That is, the write operation is performed on a page, whereas the erase operation is performed on a block. Detailed methods of overwriting a page are determined according to the page update method employed.
Flash memory is classified into two types—Single Level Cell (SLC) type and Multi-Level Cell (MLC) type—according to the capacity of a memory cell. The former is capable of storing one data bit per cell, whereas the latter is capable of storing two (or more) data bits per cell. Therefore, MLC-type flash memory has larger capacity than SLC-type flash memory and is predicted to be widely used in high-capacity flash storage. Table 1 summarizes the parameters and values of MLC-type flash memory. The size of a page is 2,048 bytes, and a block has 64 pages. In addition, the access time of operations increases in the order of read, write, and erase operations. The read operation is 7.9 times faster than the write operation, which is 1.6 times faster than the erase operation.
TABLE 1Parameters and values of flash memory (Samsung K9L8G08U0M 4 GBMLC NAND)SymbolsDefinitionsvaluesNblockThe number of blocks32,768NpageThe number of pages in a block64SblockThe size of a block (bytes) (=Npage × Spage)135,168(64 × 2,112)SpageThe size of a page (bytes) (=Sdata + Sspare)2,112(=2,048 + 64)SdataThe size of data area in a page (bytes)2,048SspareThe size of spare area in a page (bytes)64TreadThe required time for reading a page (μs)120TwriteThe required time for writing a page (μs)950TeraseThe required time for erasing a block (μs)1500
Hereinafter, in order to reduce ambiguity in the present invention, a physical page and a logical page are distinguished from each other. Pages in memory are called logical pages, and pages in flash memory are called physical pages. Further, for convenience of description, it is assumed that the size of a logical page is equal to that of a physical page.
There have been a number of methods of storing updated pages into flash memory for flash-based store systems. In the present invention, these methods are called page update methods. Page update methods are divided into two categories: page-based and log-based.
The page-based methods store a logical page into a physical page. When an updated page needs to be reflected into flash memory (for example, when the updated page is swapped out from the DBMS buffer to the database), the whole logical page is written into a physical page. When a logical page is recreated from flash memory (for example, when a page is read into the DBMS buffer), the logical page is read from a physical page. These methods can be implemented in a middle layer called the Flash Translation Layer (FTL). Thus, they are loosely coupled with the storage system. The FTL maintains logical-to-physical address mapping between logical pages and physical pages, as shown in FIG. 2. The FTL can be implemented as hardware in a controller residing in Solid State Disks (SSD's), or can be implemented as software in an operating system (OS) for embedded boards.
Page-based methods are classified into two schemes—in-place update and out-place update—depending on whether or not a logical page is always written into the same physical page. When a logical page needs to be reflected into flash memory, the in-place update method overwrites the page into the specific physical page that was read, and the out-place update method writes the page into a new physical page.
Since write operation cannot change bits in a page to 1, the in-place update method performs the following four steps when overwriting the logical page m1 read from the physical page p1 in the block b1 into the same physical page p1.
First, all pages, except p1, in the block b1 are read from the block b1.
Second, an erase operation is performed on the block b1.
Third, the logical page m1 is written into the physical page p1.
Fourth, all pages, except m1, read at the first step are written into the corresponding pages in the block b1.
Therefore, since the in-place update method causes one erase operation and multiple read and write operations whenever a logical page needs to be reflected into flash memory, it suffers from severe performance problems and is rarely used in flash-based storage systems.
In order to solve the problem of the in-place update method, the out-place update method writes the logical page m1 into a new physical page p2, and then, sets the physical page p1 to obsolete when the logical page m1 needs to be reflected into flash memory. When free pages in flash memory are insufficient, obsolete pages are converted into free pages through garbage collection.
The out-place update method is widely used in flash-based storage systems because an erase operation does not occur whenever a logical page needs to be reflected into flash memory. FIGS. 3A and 3B illustrate an example of the out-place update method. FIG. 3A illustrates the logical page m1 read from the physical page p1 in the block b1. FIG. 3B illustrates the updated logical page m1 and the process of writing it into the physical page p2, where p1 is an original page read, and p2 is a new page written.
Log-based methods generally store a logical page into multiple physical pages. When a logical page is updated, the update logs of logical pages are first collected into a write buffer in memory. Here, an update log represents the changes in a page resulted in a single update command. Further, when the buffer is full, the buffer is written into a physical page. Therefore, when a logical page is frequently updated, the update logs of the page are stored into multiple physical pages. Accordingly, when a logical page is recreated, multiple physical pages may be read and merged. In addition, the log-based methods are tightly-coupled with the storage system because the storage system must be modified to be able to identify the update logs of a logical page.
Among the log-based methods, there are Log-Structured File System (LFS), Journaling Flash File System (JFFS), Yet Another Flash File System (YAFFS), and In-Page Logging (IPL). LFS, JFFS, and YAFFS write the update logs of a logical page into arbitrary log pages in flash memory, whereas IPL writes the update logs into specific log pages. IPL divides pages in each block into a fixed number of original pages and log pages. This method writes the update logs of a logical page into only the log pages in the block containing the original page of the logical page. Therefore, when a logical page is recreated, IPL reads original pages and only log pages in the block. When there is no free log page in the block, IPL merges original pages and log pages in a block, and writes the merged pages into pages in a new block (this process is called ‘merging’).
Compared to other log-based methods, IPL improves read performance by reducing the number of log pages to read from flash memory when a logical page is recreated because the number of log pages is not infinitely increased due to merging. Since IPL inherits the advantage/disadvantage of the log-based methods except for the effect of merging, it has performance similar to that of other log-based methods.
FIGS. 4A to 4B illustrate an example of the log-based methods.
FIG. 4A illustrates the logical pages m1 and m2 in memory.
FIG. 4B illustrates the update logs q1 and q2 of the logical pages m1 and m2, and the process of writing q1 and q2 into flash memory. After q1 and q2 are written into the write buffer, the content in the buffer is written into the log page p3. Therefore, the update logs q1 and q2 are collected into the same log page p3.
FIG. 4C illustrates the update logs q3 and q4 of the logical pages m1 and m2, and the process of writing q3 and q4 into flash memory. After q3 and q4 are written into the write buffer, the content in the buffer is written into the log page p4. Therefore, the update logs q3 and q4 are collected into the same log page p4.
FIG. 4D illustrates the logical page m1 recreated from flash memory.
FIG. 4E illustrates the process of creating the logical page m1 in FIG. 4D. Here, the logical page m1 is recreated by merging the original page p1 with the update logs q1 and q2 read from the log pages p3 and p4, respectively.
Compared with the log-based methods, the page-based methods write not only the changed parts but also unchanged parts in a page. In contrast, the log-based methods write only update logs. Therefore, when updates do not frequently occur, the page-based methods have worse write performance than the log-based methods. The page-based methods write an update page into flash memory only when the updated page needs to be reflected into flash memory. However, the log-based methods write update logs into the write buffer whenever a logical page is updated. When the buffer is full, the buffer is written into flash memory. Therefore, when updates frequently occur, a logical page is updated many times, and thus, the total size of update logs of one page may be larger than the size of one page. In this case, the page-based methods have better write performance than the log-based methods.
Next, when a logical page is recreated from flash memory, the page-based methods require only a single read operation since the logical page is stored into one physical page. In contrast, the log-based methods require multiple read operations since a logical page is stored into multiple physical pages. Consequently, the page-based methods have better read performance than the log-based methods.
Finally, the page-based methods are DBMS-independent, whereas the log-based methods are DBMS-dependent.
TABLE 2Comparison of page-based methods with log-based methodsPage-based methodslog-based methodsThe data to be writtenThe whole pagethe update loginto flash memory(changed and unchanged(changed part only)parts)The time that writesWhen a page needs to beWhen write buffer isthe data intoreflected into flashfullflash memorymemoryThe number of physicalOne pageMultiple pagespages to be read whenrecreating a logical pageArchitectureLoosely coupledTightly(DBMS-independent)coupled (DBMS-dependent)