Along with development of information technology and expansion of an application field of the information technology, an amount of data to be handled by an information processing system has increased. Thus, a larger capacity and higher performance of a storage configured to store data are demanded for an information processing system of each field.
A method of aggregating data in a plurality of systems into a single storage, for example, has been used so as to increase efficiency in storage management. In this arrangement, a storage capacity for storing a large amount of data and high performance for processing a large amount of accesses from the plurality of systems at high speed are demanded for the storage.
<HDD>
By employing as a storage medium, an HDD (Hard Disk Drive) whose capacity has been increased, an increase in the demand for the storage capacity can be accommodated. Further, by increasing the number of storage media that constitute the storage, the demand for the storage capacity can be flexibly accommodated.
<Cache>
A cache, for example, is employed in order to speed up access to a storage. The cache is constituted, for example, by a DRAM (Dynamic Random Access Memory) or the like. The cache for a storage stores temporarily a duplicate or an update of data stored in the storage medium such as the HDD. Assume that the data in an HDD or the like is temporarily stored in the cache. Then, when an access is performed from a host side to the storage, the number of accesses to the storage medium such as the HDD is reduced when there is a cache hit. As a result, improvement of access performance can be achieved. However, the cache has a smaller capacity than a storage medium such as an HDD. For this reason, in the case of a read or a write of data with an amount exceeding the storage capacity of the cache, an effect of improvement of access performance using the cache is limited.
<Striping>
Striping, for example, is employed as means for improving access performance for a storage medium such as an HDD. In the striping (also referred to as RAID 0 (Redundant Arrays of Inexpensive Disks 0), data is divided with a predetermined size, and divided data are alternately distributed into a plurality of HDDs (storage media) for write or read, thereby achieving speeding-up. Two or more drives (HDDs) are needed for the striping (RAID 0). The striping (RAID 0) has no redundancy. When just one drive fails, for example, all the data in a disk array are lost. By increasing the number of the drives in the disk array in the striping (RAID 0), access performance can be improved. However, in accordance with an increase in the number of drives (number of HDDs) in the disk array, a failure rate may also increase.
<SSD>
A method of storing data in a high-speed storage medium is employed as means for improving access performance. An SSD (Solid State Drive), for example, is employed in a storage as a high-speed storage medium (storage device). Generally, an SSD comprises a flash memory (collectively erasable type and programmable non-volatile memory), in particular, a NAND-type flash memory. Though an SSD has lower access performance than a DRAM, an SSD has higher-speed access performance than an HDD. Though an SSD has a smaller storage capacity than an HDD, an SSD can implement a storage medium with a larger capacity than a DRAM. Generally, an SSD is more expensive than an HDD and is limited in the number of times of data writes due to a characteristic of a NAND-type flash memory.
In a NAND-type flash memory, write and read accesses are each performed for each page (of approximately 4 KB (Kilo Bytes), for example, though not limited thereto). Generally, this page size is smaller than a block size (of 256 KB, for example) that is a unit of a collective erasure operation. In the NAND-type flash memory, it is necessary to perform collective erasure of a block that includes written data, on a per block basis, in order to newly write data into a page or the block into which data has been already written.
When a write request such as a random write is made from a host to the storage including an SSD as a storage medium, and old data has been already written in a page of a write destination, writing of data is performed as follows:                reading, into a DRAM or the like, each page (of 4 KB, for example) other than a target of an overwrite requested by the write request, from a block (of 256 KB=64 pages, for example) corresponding to a logical address range of the write request;        erasing the block;        combining, on the DRAM or the like, write data (overwrite target page) requested by the write request and each of pages other than the overwrite target that have been read in the DRAM or the like;        writing back data obtained by the combination on the DRAM or the like into the block in the SSD from the DRAM or the like, on a per page basis; and        updating a correspondence (address management information) between a page address (physical address) of the data written into the block in the SSD and a logical address used for access to the data from the host.        
As mentioned above, when an overwrite to a page including already written data occurs in an SSD, generally, a size in a write operation or in a read operation will become larger as compared with a size of data requested for a write from the host. That is, with respect to a request for writing data corresponding to one page (of 4 KB, for example) into an SSD, pages (such as 63 pages) other than the page of an overwrite request are read from a corresponding block (of 256 KB=64 pages) in the SSD, and the pages corresponding to one block are written back, on a per page basis. In the SSD, whenever an already written page is updated (written), the following operations are performed:                reading each page other than the page of an overwrite target from the SSD and transferring and writing the page into the DRAM or the like;        performing data merging on the DRAM or the like;        performing erasure on a per block basis on the SSD; and        reading and transferring data from the DRAM or the like and writing data on a per page basis in the SSD. Thus, a processing time required for a write access will increase, so that random write performance will be remarkably reduced.<Append Scheme: SSD>        
In an append scheme (append write scheme), write target data are sequentially written in an ascending order of pages, for example, irrespective of a logical address specified by a write request from a host. When the write request (write request specifying, as a write destination address, a logical address specified before as a write destination address) is made from the host, the write target data is written into a new unwritten page in a block for which erasure has been performed, and a page corresponding to the logical address (page into which a write has been performed and which corresponds to the logical address) is made to be invalid (invalid). Then, the new page into which the write target data has newly been written is made to be valid (valid). The new page is then associated with the logical address specified by the write request.
The following describes an example of application of the append scheme to the above described SSD. When the number of pages that have been made to be invalid increases and the invalid pages are disposed at random in the append scheme, writable contiguous pages (storage regions) cannot be sufficiently secured (fragmentation). Thus, compaction (garbage-collection) or de-fragment is performed at an appropriate timing. In the compaction, valid data stored in a page (at a physical address) that is not made to be invalid is collected from a block including a page that has been made to be invalid, and is moved to a free block (unused block for which erasure has been performed) by copying, for example. Then, the original block is erased on a block basis, thereby generating a new free block. FIGS. 22A to 22D are explanatory diagrams explaining the compaction. In an example schematically illustrated in FIG. 22A, with respect to one block from a physical page address AA+0 to a physical page address
AA+63 (in which each page=4KB), invalid (invalid) pages are 2 pages from an address AA+2, 4 pages from an address AA+6, 2 pages from an address AA+12, and so on. Pages storing valid (valid) data a, c, e are respective pages with addresses AA+0, AA+4, AA+10, and so on. As illustrated in FIG. 22B, the valid data a, c, and e of the pages with the addresses AA+0, AA+4, AA+10, and so on are sequentially copied to a free block (with a leading address BB+0). When a storage medium is the SSD, the block illustrated in FIG. 22A is correctively erased, on per a block basis, after the copying has been finished, thereby generating a free block illustrated in FIG. 22C (erasure is not necessary when the storage medium is an HDD, and the block in FIG. 22A is managed as a free block after the copying has been completed). Then, with respect to an address conversion table configured to map each logical address to a physical address, the following operations are performed, for example:                updating a physical address corresponding to a logical address LAa of the data a from the original address AA+0 to the address BB+0 of the new block;        updating a physical address corresponding to a logical address LAc of the data c from the original address AA+4 to an address BB+2; and        updating a physical address corresponding to a logical address LAe of the data e from the original address AA+10 to an address BB+4 (refer to FIG. 22D).        
Though not limited thereto, a leading address (base address) for specifying an access by a logical address may be set to be located at a page boundary (page boundary is defined for each 4 KB when a page size is 4KB, for example), as with the physical address, for example.
<Append Scheme: HDD>
The above description has been mainly given about the append scheme when the SSD is employed as the storage medium (storage device). There is an LFS (Log-structured File System) described in Non Patent Literature 1, Non Patent Literature 2, and so on, for example, as an append-lonely logging scheme for an HDD or the like. In the LFS, speeding-up of a write access is implemented by sequentially and asynchronously accessing a disk. (Plural) file (Data) changes are collected into a file cache (write buffer), and these changes are packed together. Then, these changes are written into a disk by sequential transfers (sequentially written at a transfer speed close to a maximum band width of the disk, for example). To take an example, file system metadata such as modified file data, a directory block, an inode block (file system data structure in a UNIX (registered trademark) system: information on an object in a file system such as a file or a directory is stored) are packed together and are sequentially written into the disk. In the LFS, data is added to a file system in an add-only log form.
The append write scheme is focused on random write performance alone to achieve improvement of the performance. The HDD has random access performance lower than successive access performance. In a successive access, information is physically and sequentially accessed. The successive access is thus also referred to as a sequential access. A random access is characterized by not physically needing sequential access.
In the case wherein an address specified by a write request is fixed (e.g., a correspondence relationship between a physical address and a logical address in a storage medium is fixed), a low-speed random write occurs by a write to a non-contiguous storage region in the storage medium.
In the append write scheme, a storage destination address (such as a physical page address) of data to be stored in the storage medium is dynamically changed. When write data is written to a logical address specified by a write request, an address (such as a physical address) of a free region into which the write data has been written is associated with the logical address specified by the write request. Then, by successively storing write data in the free region, occurrence of a low-speed random write is reduced.
In the append write scheme, a random write that is disadvantageous in terms of performance of an HDD or the like can be replaced by a successive write that is advantageous in terms of performance. For this reason, as long as a region for an append can be secured on a storage medium, the append write scheme is not restricted by a capacity as in a cache, for example (wherein an effect of performance improvement is limited for a read or a write of data of a capacity exceeding a storage capacity of the cache).
The append write scheme aims at improvement of access performance by replacing a low-speed random write access by a successive write access to a contiguous free region of an append write destination, as mentioned above. That is, in the append write scheme, a random write access corresponding to a storage capacity secured for an append can be accommodated. However, in order to reduce the frequency of random write accesses (improve the access performance by increasing the number of sequential accesses), a large contiguous free region needs to be secured as an append write destination.
In the append write scheme, when data written to the storage medium such as the HDD is updated, update data is written to a new storage region (free region), without being overwritten to a storage region where the data (old data) to be updated is stored. The storage region having the old data before the update (data to be updated) stored therein is made to be invalid to become a storage region (unused region) for newly storing data. On the other hand, a storage region having un-updated data stored therein remains in a valid (valid) state. For this reason, when a data update follows, fragmentation (fragmentation) of the free region (unused region capable of storing the new data) occurs. The frequency of random writes due to the fragmentation of the free region (decrease in a contiguous free region) increases. As a result, storage access performance is reduced. Thus, elimination of fragmentation of the free region (de-fragment (de-fragment), garbage-collection (garbage-collection), or segment cleaning (segment-cleaning) in the LFS) is performed. As mentioned above, in the elimination of fragmentation (de-fragment) of the free region, a contiguous free region is increased after valid data stored in the storage medium has been read and moved to a different region of the storage medium by restoring (copying), for example. Since the de-fragment involves a read access and a write access to the storage medium, the access performance is reduced for a common access (data access to be performed on a system side and not related to the de-fragment). In order to avoid deterioration of the access performance for the common access, the de-fragment is performed in one of the following conditions, for example:                when no access to the storage medium is performed; or        in a low load state.        
While write accesses are continuously performed to the storage, for example, the low load state does not occur. Assume that the de-fragment is performed while the write accesses are continuously performed. Then, write access performance is remarkably reduced. For this reason, while the write accesses are continuously performed, the de-fragment is not performed. However, by not performing the de-fragment, fragmentation of a free region will occur. Then, it becomes difficult to secure an adequate contiguous free region or the like. Data are to be stored in noncontiguous free regions. Consequently, access performance of the continuous write accesses will be remarkably reduced.
Patent Literature 1 discloses a method as a configuration for improving IOPS (IO/per second: the number of times of accesses that can be processed per second) performance when data is written, without performing garbage collection. The method comprises a step of writing first data from a host to pages in Block 3 of blocks 1 to 3 that constitute a recording medium; a step of writing to a buffer second data recorded in a page of Block 1 that has been selected based on a non-use page; a step of erasing data recorded in the page of Block 1; and a step of writing the second data to the page of Block 1. The non-use page indicates a page in which no data is written, or a page in which data written in a different page is written. This method allows reduction of the number of times of erasing the data recorded in the block without performing the garbage collection.
Patent Literature 2 (PTL 2) relates to a technology for allocating a free storage region of at least one storage apparatus to a virtual volume. Patent Literature 2 discloses a configuration for performing real region allocation in which fragmentation (fragmentation) is avoided and use efficiency of a storage region is improved. In this configuration, a virtualization switch extracts and allocates a storage region from a free storage region with respect to a request size specified by a management console until a remaining size becomes smaller than a specified region size upper limit value. Then, when the remaining size becomes smaller than the region size upper limit value, the virtualization switch extracts and allocates from the free storage region, a storage region of a minimum power-of-two size not smaller than the remaining size, for this remaining size. free storage region is constituted from a plurality of contiguous free regions, the virtualization switch selects a contiguous free region of a largest size as an allocation target. In this Patent Literature 2, a large contiguous free region is selected so that fragmentation occurs as little as possible.
PTL 1: JP Patent Kokai Publication No. JP2010-237907A
PTL 2: JP Patent Kokai Publication No. JP2004-164370A
NPL 1: Rosenblum, Mendeland, Ousterhout, John K. (Jun. 1990)-“The LFS Storage Manager”, Proceedings of the 1990 Summer Usenix, pp 315-324.
NPL 2: Rosenblum, Mendeland, Ousterhout, John K. (Feb. 1992)-“The Design and Implementation of a Log-Structured File System”, ACM Transactions on Computer Systems, Vol. 10 Issue 1, pp 26-52.