Conventionally, in order to settle the problem of high I/O latency as a result of random access workloads in data centers or enterprises, a larger and excessive number of disks may be deployed such that the increasing number of heads may reduce the chance of two consecutive reads on the same disk, and in turn to improve the access performance over the disks. However, there are several drawbacks due to the over deployment, e.g., increasing number of enclosures, extra amount of space, more power and cooling system for system operation, and higher maintenance costs, etc. Moreover, system-utilization rate may diminish as overfull capacity is added.
Recently, a solution is widely accepted by leveraging the fast read performance (random or sequential) of Solid State Drives (SSDs) along with Hard Disk Drives (HDDs). It is obvious that the storage system utilizing such solution is a hybrid storage. The SSD is used as a function of caching. Namely, only hot data (most frequently used data) are temporarily stored in the SSD for access. Once those stored data are not so “hot”, they will be removed and the space is left for other hot data. Most non-hot data will be stored in HDDs. There are couples of benefits in such storage system. First, the SSD cache can provide higher peak performance value for peak requirement. Second, the SSD cache can be shifted among co-locating workloads in a virtual storage system environment. Third, pre-allocating hot data to SSD cache can lead agile response to performance requirements. In SSD cache shifting scenario, SSD cache may release soon-to-be-unused data in advance to shift SSD cache to other workloads, should other workloads be predicated or arisen periodically. Certainly the SSD cache can work for individual workload should its peak time is also predictable.
Although cache, both SSD and Random Access Memories (RAMs), has a great effect on storage performance when co-working with HDDs, SSD caches don't follow traditional RAM caching principles that its read is faster than write and sequential read/write is faster than random read/write. Most important of all, SSD will wear out after certain write cycles. Thus, control of conditions in applying SSD cache is an art when running over such hybrid storages.
There are some prior arts providing new techniques for the above requirement. For example, the US Patent Publication No. 20140244959 discloses a storage system having a control means for different types of storages. The storage system includes: a HDD storage; a SSD storage; and a storage controller that collects load information about respective loads in a number of areas in the HDD storage. It can select a candidate area in the HDD storage which is to be migrated, based on the collected load information, and migrate data in the selected candidate area to the SSD storage. The storage controller collects a count of input/output requests per unit time as the load information. It also selects the candidate area based on an average life expectancy from a duration time of that load. The average life expectancy may be further calculated by subtracting an elapse time of a load in each of the number of areas in the HDD storage. It is obvious that choosing of the data migrated to the SSD device is based on the duration that load information lasts. The system excludes periodically repeated loads which might not last for a long time but requested in a periodic and frequent pattern. Access of such repeated loads from the HDD will definitely harm the performance of the storage system.
In the US Patent Publication No. 20140258668, systems and methods for managing storage space in hybrid data storage systems are provided. The systems and methods of the application intelligently allocate data to SSDs (or other relatively high performance drives) and other data storage, such as, HDDs, based on data source, data type, data function or other like parameters. The intelligent allocation between at least one SSD and at least one other drive type allows for greater system performance through efficient use of storage space. The application adaptively uses the fastest necessary connected storage options in a hybrid data storage set to maintain maximum performance while most efficiently using minimum rewrites of data. Moreover, the allocation of data to storage in a hybrid data storage system may be controlled automatically or may be specifically set by users. Although many factors are taken into consideration in allocating data, hit ratio of the SSD can not be improved, which further limits the use of the SSD
Therefore, in order to operate a hybrid storage well and effectively, a system to control use of a SSD cache along with other HDDs are required. Especially, the system is workable to improve hit ratio of the SSD and to achieve higher performance.