The present invention generally relates to a storage device and a deduplication method, and in particular to a storage device and a deduplication method that can be suitably applied to a storage device using a flash memory as a storage medium.
Conventionally, with storage devices, random access nonvolatile storage media such as magnetic disks and optical disks have been used as the data storage media. The currently mainstream storage devices comprise a plurality of small disk drives.
In addition, pursuant to the advancement of semiconductor technology in recent years, a collectively erasable nonvolatile semiconductor memory has been developed. A flash memory is a representative example of such a nonvolatile semiconductor memory. A storage device that uses a flash memory as the storage medium is considered to be superior in terms of life span, power consumption and access time in comparison to a storage device comprising numerous small disk drives.
This flash memory is briefly explained below. A block in a flash memory is a storage area of a unit for collectively erasing data, and a page is a unit for reading and writing data. As described later, a plurality of pages are provided in a single block. Due to its characteristic feature, the flash memory is not able to directly rewrite data. In other words, when the flash memory is to rewrite data stored therein, it saves the stored valid data in another block, and then erases the stored data in block units. The flash memory thereafter writes data into the block from which the data was erased.
Specifically, although the flash memory is able to rewrite “1” as “0,” it is not able to rewrite “0”, as “1.” Thus, with a flash memory, all data stored in a block are erased upon rewriting data. Like this, the rewriting of data in a flash memory involves the erasure of data per block. Nevertheless, the time required to erase one block worth of data in a flash memory is roughly several 10 times longer in comparison to the time required to write one page worth of data. Thus, if one block worth of data is erased each time one page worth of data is rewritten, the data rewriting performance of the flash memory will become pessimistically inferior. In other words, when a flash memory is used as the storage medium, it will be necessary to write data using an algorithm capable of hiding the time required to erase data from the flash memory.
In a standard data rewriting operation of a flash memory, a method of adding data to an unused area is adopted, and data is not erased each time data is rewritten. Nevertheless, if the rewriting of data is conducted in succession, the unused area in the flash memory will run low, and it is necessary to erase the unnecessary data written into the flash memory and return the storage area to a reusable state. Thus, a block reclamation process (hereinafter referred to as “reclamation”) of copying only valid data in a block containing old data to an unused area and erasing the copy source block to return such block to a reusable state is essential for high speed data rewriting in a flash memory. This reclamation is executed to blocks containing numerous invalid data.
Meanwhile, a flash memory has a limitation on the number of times data can be erased. For instance, an erase count of up to 100,000 times per block is guaranteed. A block with an increased erase count as a result of data rewriting being concentrated therein has a problem of becoming unusable since data can no longer be erased from such block. Thus, when using a flash memory as the storage medium, it is necessary to perform leveling processing of the erase count in order to prevent data erase processing from becoming concentrated on a specific block.
In order to hide the data erase time and level the data erase count as described above, address translation processing from a logical address to a physical address is performed in the flash memory module upon writing data. A flash memory module is configured from one or more flash memory chips and a flash memory controller for controlling the reading and writing of data from and into such flash memory chip. The flash memory controller performs the translation of the logical address and the physical address and, in order to additionally store an address translation table, stores a logical address of a logical block as a logical unit memory area associated with a physical block in a prescribed logical address storage area for each physical block as a physical unit memory area in the flash memory.
Moreover, deduplication technology (also known as data duplication elimination technology) for reducing the capacity cost of storage devices is also attracting attention. Deduplication technology is technology for associating a plurality of logical blocks storing identical data with one physical block storing such data, and enables the economization of the storage data capacity (refer to U.S. Pat. No. 6,928,526). According to this deduplication technology, since it is possible to reduce the data rewriting count, the life span of the flash memory can be prolonged by applying such deduplication technology to a storage device using a flash memory as the storage medium.