Solid State Drives (SSDs) have a great advantage over hard disks for enabling reading and writing of data at a higher speed while keeping the electric power consumption lower. Additionally, in recent years, since SSDs have become available at lower prices, practicability of SSDs is getting higher, and more and more storage devices using SSDs have been put in practical use. A large-scale storage device in which a plurality of SSDs are connected together is called an All Flash Array (AFA). A storage device provided with an AFA includes, for example, a Central Processing Unit (CPU) serving as an arithmetic processing unit, a Dual Inline Memory Module (DIMM) structured with a plurality of Dynamic Random Access Memories (DRAMs), and the SSDs.
Because AFAs use SSDs, there is a demand for reducing the number of times of data writing and for keeping the amount of saved data small. To meet the demand. AFAs use two techniques, namely, a duplicate elimination technique and a compression technique.
The duplicate elimination technique is a technique by which duplicate data is saved only in one location of an AFA, so that all the files having the data are caused to refer to the location. The compression technique is a technique by which, when data is written from a memory to an SSD, the data is compressed and saved, so that the compressed data is read from the SSD into a memory and decompressed into the memory. As for compressing methods, in many situations Lempel-Ziv (LZ)-based compressing methods are used by which a compressing process is performed while omitting mutually the same patterns by making a forward reference. When LZ-based compressing methods are used, it is possible to decompress the data at a high speed.
According to a duplicate elimination technique, an AFA manages pieces of data stored in the SSDs, by using meta data that stores therein information about the pieces of data stored in the SSDs. The meta data is kept in a memory at all times for the purpose of eliminating duplicates and performing compressing processes at a high speed.
For example, to realize a duplicate elimination process, the meta data has Physical Block Addresses (PBAs) of the pieces of data saved in the SSDs and hash values used for eliminating the duplicates. Further, in the situation where the data stored in the SSDs is managed by using the meta data, when the data size managed by each piece of meta data is too large or too small, performance is degraded.
For example, when the data size is small, because the number of pieces of meta data arranged in the DRAMs increases and because the volume in the memories occupied by the meta data also increases, the volume in the memories that, is usable for other processing processes becomes smaller, which leads to a degradation of performance. On the contrary, when the data size is large, because the cache size in the memories also becomes large and because the volume in the memories is wastefully used by, for example, placing unused data in the DRRMs, the volume in the memories that is usable for other processing processes becomes smaller, which leads to a degradation of performance. For these reasons, it is desirable to determine the data size managed by the meta data on the basis of the number of pieces of meta data and the cache size. For example, the data size managed by the meta data may be 8 KB.
As explained above, AFAs perform the duplicate elimination processes in units each having the data size managed by the meta data. For example, when the data size managed by the meta data is 8 AFA performs the duplicate elimination processes in units of 8 KB.
In contrast, for SSDs, because many data accesses are made in units of 4 KB in various application programs, the performance of commonly-used products is optimised for processes performed in units of 4 KB serving as a page size.
Further, a conventional technique is known by which data is divided into sections, so that pieces of data having mutually the same contents are compressed together in one piece as common data. Another conventional technique is also known, by which an image is compressed, as being divided into predetermined blocks, so that it is possible to partially restore the image at the time of restoration.
Patent Literature 1: Japanese Laid-open Patent Publication No. 2010-61518
Patent Literature 2: Japanese Laid-open Patent Publication No. 2003-319186
However, as explained above, there is a situation to consider where the management size of data used by an AFA and the unit size of data used by the SSDs are different from each other. In that situation, there is a possibility that, in response to a data read request, the AFA may read data in a wasteful manner. For example, when the AFA manages data in units of 8 KB, whereas data stored in the SSDs is accessed in units of 4 KB, the AFA reads and decompresses 8-KB data when receiving a request to read 4-KB data and responses with designated data corresponding to 4 KB selected out of the decompressed data. Thus, the reading and decompressing of the data corresponding to 4 KB is wasted. As explained herein, when the conventional compression technique of AFAs is used, the performance in the compressing and decompressing processes is degraded due to the inconsistency in sizes by which the data is handled. Thus, there is a possibility that the processing capability such as that expressed with an Input Output Per Second (IOPS) value may be lowered.
To cope with this situation, one possible method is to compress the 8-KB data by dividing the data into sections of 4 KB and to store the boundary in a memory. In that situation, in response to a request to read 4-KB data, it is possible to read and decompress 4-KB data that has been read. For example, when 8-KB data is compressed altogether, the IOPS value for a request to read 4-KB data is 285K IOPS. In contrast, when the data is compressed after being divided into sections of 4 KB, the IOPS value for a request to read 4-KB data is 460 K IOPS. However, when the data is compressed after being divided into sections of 4 KB, the compression ratio is lower than in the situation where the 8-KB data is compressed altogether. For this reason, the amount of data which the AFA is able to store therein becomes smaller.
Further, even by using the conventional technique by which data is divided, into sections so that pieces of data having mutually the same contents are compressed together in one piece as common data, when the sizes by which the data is handled is inconsistent, there is a possibility that the compression ratio may be lowered because the data is handled in the same manner as in the situation where the data is compressed after being divided into sections of 4 KB. Further, even by using the conventional technique by which data is partially restored at the time of restoration, there is a possibility that the processing capability may be degraded and/or that the compression ratio may be lowered, due to the compressed data in the situation where the sizes by which the data is handled are inconsistent.