1. Field of the Invention
The present invention relates to a hierarchical storage control apparatus, a hierarchical storage control system, a hierarchical storage control method, and a program.
2. Description of the Related Art
Computer systems employ a hierarchical storage apparatus including an upper storage layer and a lower storage layer. The hierarchical storage apparatus employs a high-speed, small-capacity storage device as the upper storage layer and a low-speed, large-capacity storage device as the lower storage layer. The hierarchical storage apparatus combines accessibility and storage capacity at a relatively low cost based on the spatial and temporal locality of access to the storage devices.
For example, a cache is used as the upper storage layer, and a hard disk drive (HDD) as the lower storage layer. Since the cache is accessible faster than the HDD, the cache that is used as the upper storage layer is effective to hide the slow accessibility of the HDD.
When a storage apparatus incorporating a cache and an HDD receives a write command to write data from a host, the storage apparatus temporarily stores the requested data in the cache, and returns the write command to the host. Thereafter, the storage apparatus writes the data from the cache into the HDD. When the storage apparatus receives a read command to read data from the host, if the requested data are present in the cache, then the storage apparatus returns the requested data from the cache to the host. Since the storage apparatus does not need to read the data from the HDD, the slow accessibility of the HDD is concealed, and the accessibility of the storage apparatus is increased.
Generally, memory devices for use as caches have a higher price per capacity than HDDs. Therefore, caches should desirably have a high utilization efficiency. According to a proposed method of increasing the utilization efficiency of a cache, the cache is divided into a plurality of blocks, and duplicated caching is avoided in the blocks (Document 1 (see JP-A No. 2007-41904)).
Another storage apparatus includes an HDD whose storage area is divided into a plurality of blocks which have respective address spaces independently from each other. According to the SCSI (Small Computer System Interface) standards, the blocks are identified by LUNs (Logical Unit Numbers) that are identification numbers assigned to the respective blocks.
Generally, if the storage area of a storage apparatus includes a plurality of blocks, then the blocks may have the same data string. For example, if two hosts that operate under one operating system have respective system images stored in different blocks of one storage apparatus, then most of data strings regarding the operating system stored in those blocks are represented by the same data string.
On the other hand, one block may store duplicates of the same data string. This occurs when files of the same contents are present in different directories in a file system that is configured in one block.
To deal with such a problem, there is known a technology for increasing the efficiency with which to utilize the storage area of an HDD by storing only one data string in the HDD, e.g., a technology known as duplication (see, for example, Non-patent document 1, Monthly magazine “Computer World” October 2007, IDG Japan, Oct. 1, 2007, pp. 98-103).
According to duplicated caching, when data are stored in an HDD, it is determined whether the same data string is present at different addresses. If the same data string is present at different addresses, then only one data string is stored in the HDD.
However, since the HDD stores only one data string for different addresses according to duplicated caching, a fault on the HDD may result in a significant data loss. Furthermore, if information for managing the duplicated data is lost, then the information is highly difficult to recover. Therefore, it is desirable not to use the duplication from the standpoint of data availability.
According to duplicated caching, moreover, the speed for writing data is low because each time data are written, the data that have already been stored are checked to see if there are the same data string therein. It is thus desirable not to use the duplication for systems in which access rate is of importance.
However, without the duplicated caching being used, then if a plurality of identical data strings are stored in an HDD, the utilization efficiency of a cache associated with the HDD is lowered. As a result, the performance of the storage apparatus is lowered because one area in the cache corresponds to only one area in the HOD, possibly causing a plurality of identical data strings to be stored in the cache.
The above difficulty is not limited to the storage apparatus, but also applies to systems employing a general hierarchical storage apparatus. Specifically, as one area in the upper storage layer corresponds to only one area in the lower storage layer, when a plurality of identical data strings are present in the lower storage layer, a plurality of identical data strings may be present in the upper storage layer. Consequently, the utilization efficiency of the upper storage layer is lowered, causing a reduction in the system performance.