A data storage system may have multiple tiers of storage in which data may be stored. For example, a data storage system may have battery-backed random access memory (RAM), flash storage, disk storage, tape storage, cloud-based storage, and other storage. The different types of storage may have different properties. For example, different types of storage may have different retrieval capacities, different speeds, different costs, different energy requirements, or other properties. Conventionally, a data storage system may have organized the different types of storage available into tiers based on the properties. For example, a smaller, faster yet more expensive cache may have been made from RAM while a larger slower but less expensive long term repository may have been made from tape storage. Flash memory and disk storage may have been organized into intermediate tiers. Conventionally it may have been difficult, if even possible at all, to optimize data storage in a multi-tier system.
A location from which data can be read most quickly may be employed as a cache. A “cache” may be thought of as storing a quickly available “working copy” of data that when not in use may be stored somewhere else (e.g., in a slower, higher capacity, lower cost tier). Conventionally, optimizing data storage in a multi-tier storage system may have sought to increase a cache hit rate or cache utilization while providing an overall acceptable quality of service with respect to retrieval time and cost. Conventionally, optimizing data storage in a multi-tier system may have sought to place certain types of content (e.g., most frequently used, most time sensitive) in a certain type of storage (e.g., cache, fastest) and may have sought to optimize movement of data between different types of storage (e.g., from largest/slowest to fastest). The effectiveness of optimization may have been determined by the timing and quality of decisions concerning when to copy data between tiers.
Optimizing data storage in a multi-tier storage system may involve making good decisions about whether and when to move data between tiers. For example, optimizing multi-tier storage may involve making decisions about when to flush data, when to evict data, when to recall data, or when to delete data. Flushing data may refer to copying data from one tier (e.g., smaller/faster/more expensive) to another tier (e.g., larger/slower/less expensive). Evicting data may refer to removing data from a tier (e.g., smaller/faster/more expensive). Recalling data may refer to copying data from one tier (e.g., larger/slower/less expensive) to another tier (e.g., smaller/faster/more expensive). Deleting data may refer to removing data from a tier (e.g., larger/slower/less expensive) or data storage system completely. When a data storage system includes a “cache” tier that is the smallest and fastest, flushing data may refer to copying data from the cache to another tier (e.g., larger/slower/less expensive), evicting data may refer to removing data from the cache, recalling data may refer to copying data to the cache, and deleting data may refer to removing data from a non-cache tier.
Conventionally, optimizing multi-tier storage may have included programming policy decisions or rules that controlled whether, when, and how to move data. This type of conventional multi-tier storage optimization may have been challenged by the fact that an optimization for one entity (e.g., user, application, organization) or collection of entities may not have been an optimization for another entity or collection of entities. Additionally, this type of conventional multi-tier storage may have been challenged by the fact that a workload for an entity or collection of entities may change over time and thus an optimization for one point in time may not be an optimization for a different point in time.