Information services and data processing industries in general have rapidly expanded as a result of the need for computer systems to manage and store large amounts of data. As an example, financial service companies such as banks, mutual fund companies and the like now, more than ever before, require access to many terabytes of data and files stored in high capacity data storage systems. Other types of service companies have similar needs for data storage.
Data storage system developers have responded to the increased need for storage by integrating high capacity data storage systems, data communications devices (e.g., switches), and computer systems (e.g., host computers or servers) into so-called “storage networks” or “Storage Area Networks” (SANs).
A variety of storage systems (also referred to herein as “storage arrays” or simply “arrays”) are known in the art. One example of a storage system is a collection of storage devices (e.g. hard disk drives, solid-state disk drives, flash memory drives, and/or magnetic tapes) and associated communication, power, cooling, and management components. Such storage systems can also include one or more storage processors for handling both requests for allocation and input/output (IO) requests from a user. A storage processor can be the controller for and primary interface to the storage system.
Storage systems are typically used to provide storage space for one or more computer file systems, databases, applications, and the like. For this and other reasons, it is common for storage systems to be logically partitioned into chunks of storage space. This allows a unified storage system to appear as a collection of separate file systems, network drives, etc. Storage systems can be logically partitioned in a variety of ways. For example, the storage devices (e.g., disk drives) of a storage system can be logically organized into one or more RAID groups. A RAID group is a group of physical storage devices across which data can be distributed and/or replicated to achieve redundancy. This can avoid the loss or unavailability of data arising from a hardware failure such as a disk drive failure.
Alternatively, or in addition, the physical storage area of a storage system can be divided into discrete units called slices. A collection of slices along with a map can create a “logical unit” (LU). A logical unit can then be described as a set of logical slices, with each logical slice mapping to one or more physical slices in the storage system.
In a storage area network, a collection of storage systems can be networked together via a switching fabric to a number of host computer systems operating as servers. The host computers can access data stored in the storage systems (of a respective storage area network) on behalf of client computers that request data from the data storage systems. For example, according to conventional applications, upon receiving a storage access request, a respective host computer in the storage area network can access a large repository of storage through the switching fabric of the storage area network on behalf of the requesting client. Thus, via the host computer (e.g., server), the client has access to the shared storage system. In many applications, storage area networks support high-speed acquisitions of data so that the host servers are able to promptly retrieve data from and store data to the storage system.
A storage area network can also be logically divided into one or more “storage pools.” A storage pool can be a collection of one or more storage systems, a collection of subsets of one or more storage systems, or a subset of a single storage system. Thus, a storage pool can contain one or more logical units, and each logical unit can include one or more slices.
Performance of a storage system can be characterized by the system's total capacity, response time, throughput, and/or various other metrics. The capacity of a storage system is the maximum total amount of data that can be stored on the system. The response time of a storage system is the amount of time required to read data from or write data to the storage system. The throughput of a storage system is a measure of the amount of data that can be transferred into or out of (i.e., written to or read from) the storage system over a given period of time. Performance of a storage system can be measured and/or quantified at various levels, such as at the storage pool level, at the logical unit level, at the logical slice level, at the storage system level, at the RAID group level, at the disk level, etc.
The administrator of a storage array can desire to operate the storage system in a manner that maximizes throughput and minimizes response time. In general, performance of a storage system can be constrained by both physical and temporal constraints. Examples of physical constraints include bus occupancy and availability, excessive disk arm movement, and uneven distribution of load across disks or across RAID groups. Examples of temporal constraints include bus bandwidth, bus speed, spindle rotational speed, serial versus parallel access to multiple read/write heads, and the size of data transfer buffers.
One factor that can limit the performance of a storage system is the performance of each individual storage device. For example, the read access time of a storage system including hard disk drives is constrained by the access time of the disk drive from which the data is being read. Read access time can be affected by physical characteristics of the disk drive, such as the number of revolutions per minute of the spindle: the faster the spin, the less time it takes for the sector being read to come around to the read/write head. The placement of the data on the platter also affects access time, because it takes time for the arm to move to, detect, and properly orient itself over the proper track (or cylinder, for multi-head/multi-platter drives). Reducing the read/write arm swing reduces the access time. Finally, the type of drive interface can have a significant impact on overall disk array performance. For example, a multi-head drive that supports reads or writes on all heads in parallel will have a much greater throughput than a multi-head drive that allows only one head at a time to read or write data.
Furthermore, even if a disk-based storage system uses the fastest disks available, the performance of the storage system can be unnecessarily limited if only one of those disks can be accessed at a time. In other words, performance of a storage system, whether it is an array of disks, tapes, flash drives, or other storage devices, can also be limited by system constraints, such the number of data transfer buses available in the system and the density of traffic on each bus.
Within a given storage pool, there can thus exist multiple tiers (e.g., storage systems or subsets thereof) having differing performance characteristics. For example, one storage pool could have three different performance tiers. A first tier could include several RAID groups of a disk-based storage system that correspond to a collection of solid-state disk drives. A second, lower performance tier could include several RAID groups of the same disk-based storage system that correspond to small but relatively fast hard disk drives. A third, still lower performance tier could include large but relatively slow hard disk drives.
One problem with existing storage management methods and systems is that logical units and the slices thereof are allocated in a storage pool as “best-fit” at the initial allocation time. At this time, however, the IO load pattern of each logical unit and/or each slice of data is not known. In other words, a user's performance requirements with respect to a given logical unit or with respect to a given slice of data within that logical unit is generally either not known or only roughly approximated when that logical unit is created. Thus, the performance capability of a logical unit or slice can be too high or too low for the data to which it corresponds. For example, allocating a logical unit containing frequently accessed data to a lower performance tier in a storage pool will result in the storage pool appearing “slow” to a user. Likewise, allocating a logical unit containing rarely accessed data to a higher performance storage tier results in a waste of the storage system's performance capability. There is thus a need for methods and systems that can relocate frequently-accessed data to a higher-performance storage tier while relocating less-frequently-accessed data to a lower-performance storage tier.
Another problem with existing methods and systems is that they fail to compensate for the fact that users' performance requirements will often change over time, in part because data access patterns tend to change over time. Older data is generally accessed less frequently and therefore does not require storage with higher performance capabilities. In existing storage methods and systems, as once-frequently-accessed data ages, it can remain on a high performance storage tier, taking up valuable space that could be used to store newer, more-frequently-accessed data. Thus, as data “ages” and becomes less-frequently accessed, it can be desirable to relocate it to a lower performance storage tier. Existing methods and systems lack an automated mechanism to perform such relocation. A need exists then for methods and systems that can relocate older data to lower-performance storage tiers and at the same time relocate newer data to higher-performance storage tiers.
Yet another drawback to existing storage management methods and systems is that, when new storage devices are added to a storage pool, the benefit of the added storage device may not be utilized immediately. For example, a user might add additional, higher-performance storage systems or devices to a storage pool in an effort to increase throughput and reduce response times. Until data is stored on this new hardware, however, little or no performance gain will be observed. Accordingly, there is a need for methods and systems that can automatically relocate frequently accessed data to higher-performance storage systems or devices as they are added to a storage pool.