The invention relates in general to the field of computerized methods for managing tiered storage systems, which involve determining an assignment of data to be stored on (across) storage tiers of a tiered storage system. The invention further concerns related storage systems and computer programs. In particular, it is directed to methods to design or manage such systems, which take into account the additional workload necessary to archive and prefetch data.
Tiered storage typically relies on assigning different data to various types of storage media, in order to reduce the global storage cost and provide good data access performance. A tier is a homogenous collection of storage devices of a same kind, having all similar if not identical storage characteristics. A tiered system typically involves two, three or more tiers of storage media, e.g., SSD devices, high-end disks, low-end disks, and tape drives. A tiered storage system usually relies on policies that assign most frequently accessed data to high-performance storage tiers, whereas rarely accessed data are stored on low-performance (cheaper) storage tiers.
The read/write performance of a data storage system, typically estimated in terms of throughput or mean response time for a request, depends on the characteristics of the storage devices (e.g., latency and bandwidth), the nature of the input/output (I/O) workload (e.g., the frequency and size of I/O requests to each data object), and on the strategy chosen for assigning data across the storage devices. Given a set of storage devices and an I/O workload, the performance of the system depends on the data assignment strategy. Improper data assignment can result in poor performance and wastage of storage resources.
Tiered storage systems are known, which ensure that the amount of data stored on each device of a tier is balanced. Other systems use iterative heuristic approaches to address this problem using TOPS (I/Os per second) as a performance metric by assuming that each I/O request is of a fixed size. Other methods employ load-balancing mechanisms to ensure that the load on each storage device (the load defined as the expected percentage of time a device is busy serving I/O requests) is the same across all devices. Many load-balancing algorithms exist, such as the Greedy-Scheduling algorithm and the Longest Processing Time (LPT) algorithm.