Information services and data processing industries in general have rapidly expanded as a result of the need for computer systems to manage and store large amounts of data. As an example, financial service companies such as banks, mutual fund companies and the like now, more than ever before, require access to many terabytes of data and files stored in high-capacity data storage systems. Other types of service companies have similar needs for data storage.
Data storage system developers have responded to the increased need for storage by integrating high capacity data storage systems, data communications devices (e.g., switches), and computer systems (e.g., host computers or servers) into storage networks or Storage Area Networks (“SAN”s).
A variety of storage systems, also referred to herein as “storage arrays” or simply “arrays” are known in the art. One example of a storage system is a collection of storage devices (e.g., solid-state disk drives or flash memory drives) and associated communication, power, cooling, and management components. Such storage systems can also include one or more storage processors for handling both requests for allocation and input/output (IO) requests from a user. A storage processor can be the controller for and primary interface to the storage system.
Storage systems are typically used to provide storage space for one or more computer file systems, databases, applications, and the like. For this and other reasons, it is common for storage systems to be logically partitioned into chunks of storage space. This allows a unified storage system to appear as a collection of separate file systems, network drives, etc. Storage systems can be logically partitioned in a variety of ways. For example, the storage devices (e.g., disk drives) of a storage system can be logically organized into one or more redundant array of inexpensive disks (“RAID”) groups. A RAID group is a group of physical storage devices across which data can be distributed and/or replicated to achieve redundancy. This can avoid the loss or unavailability of data arising from a hardware failure such as a disk drive failure.
Alternatively, or in addition, the physical storage area of a storage system can be divided into discrete units called slices. A collection of slices along with a map can create a logical unit. A logical unit can then be described as a set of logical slices, with each logical slice mapping to one or more physical slices in the storage system.
In a storage area network, a collection of storage systems can be networked together via a switching fabric to a number of host computer systems operating as servers. The host computers can access data stored in the storage systems (of a respective storage area network) on behalf of client computers that request data from the data storage systems. For example, according to conventional applications, upon receiving a storage access request, a respective host computer in the storage area network can access a large repository of storage through the switching fabric of the storage area network on behalf of the requesting client. Thus, via the host computer (e.g., server), the client has access to the shared storage system. In many applications, storage area networks support high-speed acquisitions of data so that the host servers are able to promptly retrieve data from and store data to the storage system.
A storage area network can also be logically divided into one or more “storage pools.” A storage pool can be a collection of one or more storage systems, a collection of subsets of one or more storage systems, or a subset of a single storage system. Thus, a storage pool can contain one or more logical units, and each logical unit can include one or more slices.
Performance of a storage system can be characterized by the system's total capacity, response time, throughput, and/or various other metrics. The capacity of a storage system is the maximum total amount of data that can be stored on the system. The response time of a storage system is the amount of time required to read data from or write data to the storage system. The throughput of a storage system is a measure of the amount of data that can be transferred into or out of (i.e., written to or read from) the storage system over a given period of time. Performance of a storage system can be measured and/or quantified at various levels, such as at the storage pool level, at the logical unit level, at the logical slice level, at the storage system level, at the RAID group level, at the disk level, etc.
The administrator of a storage array can desire to operate the storage system in a manner that maximizes throughput and minimizes response time. In general, performance of a storage system can be constrained by both physical and temporal constraints. Examples of physical constraints include bus occupancy and availability and uneven distribution of load across disks or across RAID groups. Examples of temporal constraints include bus bandwidth, bus speed, serial versus parallel access to multiple read/write heads, and the size of data transfer buffers.
Data has a lifecycle. As data progresses through its lifecycle, it experiences varying levels of activity. When data is created, it is typically used heavily. As it ages, it is typically used less frequently. In recognition of this, developers have worked to create systems and methods for ensuring that heavily used data are readily accessible, while less frequently used data are stored in a more remote location. Correlations between location of data storage and frequency of access to data are necessary because storage has an inherent cost. Generally speaking, the faster the storage medium is at providing access to data, the costlier the storage medium.
Two concepts relevant to the tradeoffs between data usage and storage costs are relevant for purposes of the embodiments disclosed herein. First, data storage systems involving tiered storage have emerged. Tiered data storage systems include multiple tiers of non-volatile storage with each tier providing a different quality of service. For example, a system may include a first tier (Tier 1), a second tier (Tier 2) and a third tier (Tier 3). By way of example, solid-state drives, cache, and cloud storage can be used within tiers with one unique type of technology being used within each tier. The fastest technology will be in tier 1, the next fastest in tier 2 and the slowest in tier 3. As advances in data storage mediums and speeds are recognized over time, the types of storage used and the tiers in which they are used will vary.
The type of drive interface can have a significant impact on overall disk array performance. For example, a multi-head drive that supports reads or writes on all heads in parallel will have a much greater throughput than a multi-head drive that allows only one head at a time to read or write data.
Tiered data systems manage placement of data on the different storage tiers to make the best use of disk drive speed and capacity. For example, frequently accessed data may be placed on Tier 1 storage. Less frequently accessed data may be placed on Tier 2 storage. And seldom accessed data may be placed on Tier 3 storage.
The second concept of importance is the notion of categorizing data so that it can be stored on the most appropriate tier. Temperature variance has been used as a framework for distinguishing between data that is frequently used, i.e., “hot” as compared to less frequently used data, or “cold” data.
A significant challenge in the categorization of data within tiered data storage systems is the effect time has on data categorization. Typically, data are hot for a limited amount of time. In addition, determining the data temperature also consumes computing resources requiring prudence in judging how frequently to assess the temperature of the vast amounts of data that can be stored in a database. Furthermore, moving data among the tiers also consumes substantial computing resources, which again necessitates tradeoffs in terms of overall resource allocation.
Some data storage systems perform automatic storage tiering. These systems monitor the activity of storage elements and move data between storage tiers to best utilize available resources and promote efficiency. For example, a Tier 2 data set may be moved to Tier 1 if the automated system determines the data have become hotter. Similarly, data may be demoted from Tier 1 to Tier 2 if the system determines the data have become colder.
Automated storage tiering algorithms are typically run on a central processing unit, which is itself part of the data storage system. The system resources required to compute data temperatures for purposes of assessing whether data should be reallocated to a different tier are significant, especially for large enterprise databases. In addition, once the system determines which data should be moved from one tier to another, executing the various read/write/copy functions necessary to move the data from one tier to another is additionally resource intensive. Further compounding the data movement issue is the fact that the temperature of the data is in constant flux.
One problem with existing storage management methods and systems is that logical units and the slices thereof are allocated in a storage pool as “best-fit” at the initial allocation time. At this time, however, the IO load pattern of each logical unit and/or each slice of data is not known. In other words, a user's performance requirements with respect to a given logical unit or with respect to a given slice of data within that logical unit is generally either not known or only roughly approximated when that logical unit is created. Thus, the performance capability of a logical unit or slice can be too high or too low for the data to which it corresponds. For example, allocating a logical unit containing frequently accessed data to a lower performance tier in a storage pool will result in the storage pool appearing “slow” to a user. Likewise, allocating a logical unit containing rarely accessed data to a higher performance storage tier results in a waste of the storage system's performance capability.
Another problem with existing methods and systems is that they fail to compensate for the fact that users' performance requirements will often change over time, in part because data access patterns tend to change over time. Older data is generally accessed less frequently and therefore does not require storage with higher performance capabilities. In existing storage methods and systems, as once-frequently-accessed data ages, it can remain on a high-performance storage tier, taking up valuable space that could be used to store newer, more-frequently-accessed data. Thus, as data “ages” and becomes less-frequently accessed, it can be desirable to relocate it to a lower performance storage tier. In order to address these shortcomings, many data storage systems have adopted automated data tiering wherein frequently-accessed data are promoted to a higher-performance storage tier, while less-frequently-accessed data are demoted to a lower-performance storage tier.
Data relocation within an automated tiering system is a two-step process. First, data slices have to be evaluated to determine their relative temperature. The most active data slices, as defined by their read/write/pre-fetch statistics, are considered the “hottest,” while the least active data slices are the “coldest.” One data have been analyzed and assigned a value indicative of temperature, the next step is identifying promotion/demotion candidates based on the current tier location of each slice and its current temperature. In this way, data slices are flagged for promotion or demotion.
At any given time, there could be a large number of data slices requiring relocation. Ideally, the data storage system would ensure that data slices on the extreme ends of the temperature spectrum, i.e., the hottest data and the coldest data, would be relocated first. To achieve this, data slices are typically sorted in two distinct processing sorts. In one sort operation, hot data are organized in descending order, hottest to less hot. In a second sort operation, cold data are organized in ascending order, coldest to less cold. Requiring two sort functions of the data consumes compute resources that could otherwise be devoted to improved system performance.
Moreover, typical automated tiering systems do not allow a user to adjust the percentages of hot and cold data he or she would like to be selected as candidates for relocation. These percentages are typically established in a service level agreement. There is therefore a need to allow users to make spontaneous adjustments to the percentages of hot and cold data that are flagged for relocation as a means of being able to optimize system performance in real time.