A virtualized cluster is a cluster of different storage nodes that together expose a single storage device. Input/output (I/O) operations sent to the cluster are internally re-routed to read and write data to the appropriate locations. In this regard, a virtualized cluster of storage nodes can be considered analogous to collection of disks in a Redundant Array of Inexpensive Disks (RAID) configuration, since a virtualized cluster hides the internal details of the cluster's operation from initiators and presents a unified device instead.
In a virtualized cluster, which may have huge amounts of storage, the drives and RAID arrays constituting the storage hardware may not be homogeneous. A combination of less expensive, slower drives and more expensive, faster drives are often used together to achieve a desired mix of performance and price. Such a homogeneous storage system consists, therefore, of a plurality of sets of physical disks or logical disks, each set having different cost and performance parameters. Determining how the data being stored in the system should best be distributed among the various drives presents an interesting challenge. Generally, two major considerations play into making such a determination. These considerations are performance maximization and utilization maximization of the most costly resources.
Just as the disk and logical disk components of a storage system may not be homogeneous, data accesses in the system may not be homogeneous. Generally, certain data may be accessed very frequently while other data may be hardly ever accessed. Moreover, some data may have been accessed frequently at some point in time, but has recently been accessed less frequently. It is typically desirable to host data that is accessed more frequently on the higher cost, higher performance storage devices. Conversely, data that is less frequently accessed may be relegated to the lower cost, lower performance storage devices. Such an arrangement may provide a storage system that puts the most costly resources to their highest and best use.
Migrating blocks of stored data to different storage areas over time can assist with placing the most used data on the highest performance storage components. Determining which data should be migrated to what storage areas and at what time can provide a difficult optimization challenge. This challenge is further complicated by the fact that data access patterns may change over time and are generally not static.
It is with respect to these considerations and others that the disclosure made herein is presented.