Modern storage systems are typically composed of multiple different types of storage devices, each storage device type having different device characteristics. This can be used to match the data workload characteristics and requirements to the most appropriate storage device. For instance, when the budget is limited, frequently accessed data can justify the higher cost of a low latency storage device whereas rarely accessed data can be placed on a lower cost device with higher latency.
However, due to different prices, storage capacities and data write and/or read performance for various storage device types available on the market, and due to various data workloads, it can be difficult to dimension or configure a tiered storage system and data placement within a tiered storage system. Further, although it might be simple to optimally choose a device for a single data unit, the assignment of all data units to the tiers of the tiered storage system and the dimensioning of the tiers are inter-dependent and need to take the budget or performance requirements into account. Optimal match-making is combinatorial in nature and therefore it can be quite difficult to optimize the dimensioning and data placement of a tiered storage system. One approach could be to enumerate all combinations of data units to tier assignments and evaluate the system performance metric that is to be optimized, which could be the system response time. In a large scale storage system with millions or billions of data units this enumeration and evaluation is not tractable. A straightforward mathematical approach could be to formulate a combinatorial optimization problem which is computationally intractable under the vast majority of relevant performance metric objectives. Multi-tier storage systems are for example disclosed in US 2013/0198449 A1, US 2012/0303929 A1, US 2012/0203999 A1, US 2013/0151804 A1, US 2011/0010514 A1, and U.S. Pat. No. 8,566,553 B1.
In common systems, various heuristics are used in practice that might often result in significantly higher system costs or lower system performance. For the most relevant performance metrics, such as system average response time, or weighted response time, or queuing time, a straightforward modelling of the above described problem can result in a non-linear optimization problem that cannot be easily solved or does not scale feasibly with the amount of data. If the system load is used, which is yet another common performance metric, certain approximations or heuristics can lead to a linear problem that can be solved, but the resulting solution can be far from optimal for many other metrics that better indicate system performance.
Another approach is based on minimizing the cost subject to load constraints which results in a linear problem but this approach cannot be used to optimize for important delay related metrics.
Accordingly, it is an aspect of the present invention to improve the assignment of data to tiers and the dimensioning of tiers within a tiered storage system.