Data storage utilization is continually increasing, causing the proliferation of storage systems in data centers. In particular, the size of the applications and the data generated therefrom is increasing. Data centers typically comprise of different tiers of storage systems. Some storage systems are high performance, and thus, are more expensive. As used herein, high performing storage systems refer to those that can provide fast access to data. In order to minimize cost, data centers typically also include some low performance storage systems because they cost less per storage quantity (e.g., gigabyte). As used herein, low performing storage systems refer to those that require more time to access data, as compared to high performing storage systems.
Not all data are the same. Some data, for example, are more relevant than others. As used herein, relevant data refers to data that are frequently accessed. Typically, relevant data are stored in high performing storage systems, and less relevant data are stored in low performing storage systems. In some instances, it is necessary to migrate data from a low performing storage system to a high performing storage system. For example, the relevance of data may change over time (e.g., data which was formerly less relevant may become more relevant over time). A conventional storage system determines which data to migrate based purely on data relevance, without any regards to storage utilization efficiency. Thus, providing fast access to data can be quite costly.
FIG. 1 is a block diagram illustrating a conventional storage system. System 100 includes slow storage array 101 and fast storage array 102. Slow storage array 101 includes storage devices 110, 111, and all other storage devices 112. Fast storage array 102 includes storage device 120. As illustrated, storage device 110 contains data “A, B, C”, storage device 111 contains data “D, E, F”, all other storage devices 112 contain data “G, H, I”. Storage device 120 contains data “A, B, C, X, Y, Z”. In this example, storage device 110 has a 40% input/output (I/O) usage profile, storage device 111 has a 55% I/O usage profile, and all other storage devices 112 have a 5% I/O usage profile. As used herein, “I/O usage profile” refers to information indicating how frequently the storage device I/O has been utilized (e.g., how frequently the data contained in the storage device has been accessed). In a conventional storage system, the selection of data to be migrated is based solely on I/O usage profiles. Thus, migratory 103 selects the storage device with the highest I/O usage profile (in this example, storage device 111) and migrates the data stored therein to fast storage array 102.
After data migration is completed, fast storage array 102 experiences a 50% increase in storage usage due to the additional data “D, E, F”. Note that storage device 110 contains data which are already present in fast storage array 102. Thus, if data from storage device 110 was selected for migration, fast storage array 102 would experience 0% increase in storage usage after migration due to data deduplication.