1. Technical Field
This application relates to selecting physical storage in data storage systems.
2. Description of Related Art
A traditional storage array (herein also referred to as a “data storage system”, “disk storage array”, “disk array”, or simply “array”) is a collection of hard disk drives operating together logically as a unified storage device. Storage arrays are designed to store large quantities of data. Storage arrays typically include one or more storage array processors (SPs), for handling both requests for allocation and input/output (I/O) requests. An SP is the controller for and primary interface to the storage array.
Storage arrays are typically used to provide storage space for one or more computer file systems, databases, applications, and the like. For this and other reasons, it is common for storage arrays to be logically partitioned into chunks of storage space, called logical units, or LUs. This allows a unified storage array to appear as a collection of separate file systems, network drives, and/or Logical Units.
A hard disk drive (also referred to as “disk”) is typically a device including a magnetic head (also referred to as “head”), a disk arm, a motor, and one or more platters that store information. The motor turns a platter underneath the magnetic head. The platter contains electrically encoded data that is detected by the magnetic head as the head passes over the platter. The platter can be read from or written to and is generally used to store data that will be accessed by the storage array. Typically, data is arranged in concentric circles on the platter, which are divided into the minimum storage unit of sectors. The magnetic head is moved along a radius of the platter, and the magnetic head reader/writer accesses particular locations within the platter as the platter spins under the magnetic head. Therefore, a disk access time consists of a seek time (to move the head over a track), a rotational delay (to rotate the sector under a head), and a transfer time (to access the requested data).
A seek time of a disk is the time required by the disk to find the required data on the disk. A seek time may include a head seek time, which is the time required by the magnetic head of the disk to move and position the head over the destination track on the platter of the disk for reading data from the destination track. The head seek time of a disk increases as the distance traveled by the magnetic head of the disk increases. An arm swing of a disk is the physical movement the disk must do in order to locate requested data.
Those skilled in the art are familiar with the read and write operations of hard disk drives. In a typical practical implementation, a disk drive may consist of circuit board logic and a Head and Disc Assembly (HDA). The HDA portion of the disk drive includes the spindles platters, disk arm and motor that make up the mechanical portion of the disk drive.
Performance of a storage array may be characterized by the array's total capacity, response time, and throughput. The capacity of a storage array is the maximum total amount of data that can be stored on the array. The response time of an array is the amount of time that it takes to read data from or write data to the array. The throughput of an array is a measure of the amount of data that can be transferred into or out of (i.e., written to or read from) the array over a given period of time.
The administrator of a storage array may desire to operate the array in a manner that maximizes throughput and minimizes response time. In general, performance of a storage array may be constrained by both physical and temporal constraints. Examples of physical constraints include bus occupancy and availability, excessive disk arm movement, and uneven distribution of load across disks. Examples of temporal constraints include bus bandwidth, bus speed, spindle rotational speed, serial versus parallel access to multiple read/write heads, and the size of data transfer buffers.
One factor that may limit the performance of a storage array is the performance of each individual storage component. For example, the read access time of a disk storage array is constrained by the access time of the disk drive from which the data is being read. Read access time may be affected by physical characteristics of the disk drive, such as the number of revolutions per minute of the spindle: the faster the spin, the less time it takes for the sector being read to come around to the read/write head. The placement of the data on the platter also affects access time, because it takes time for the arm to move to, detect, and properly orient itself over the proper track (or cylinder, for multihead/multiplatter drives). Reducing the read/write arm swing reduces the access time. Finally, the type of drive interface may have a significant impact on overall disk array storage. For example, a multihead drive that supports reads or writes on all heads in parallel will have a much greater throughput than a multihead drive that allows only one head at a time to read or write data.
Furthermore, even if a disk storage array uses the fastest disks available, the performance of the array may be unnecessarily limited if only one of those disks may be accessed at a time. In other words, performance of a storage array, whether it is an array of disks, tapes, flash drives, or other storage entities, may also be limited by system constraints, such the number of data transfer buses available in the system and the density of traffic on each bus.
Large storage arrays today manage many disks that are not identical. Storage arrays use different types of disks and group the like kinds of disks into tiers based on the performance characteristics of the disks. A group of fast but small disks may be a fast tier (also referred to as “higher tier”). A group of slow but large disks may be a slow tier (also referred to as “lower tier”). It may be possible to have different tiers with different properties or constructed from a mix of different types of physical disks to achieve a performance or price goal. Storing often referenced, or hot, data on the fast tier and less often referenced, or cold, data on the slow tier may create a more favorable customer cost profile than storing all data on a single kind of disk.
A storage tier may be made up of different types of disks, i.e., disks with different RAID levels, performance and cost characteristics. In the industry there have become defined several levels of RAID systems. RAID (Redundant Array of Independent or Inexpensive Disks) parity schemes may be utilized to provide error detection during the transfer and retrieval of data across a storage system.
A RAID system is an array of multiple disk drives which appears as a single drive to a data storage system. A goal of a RAID system is to spread, or stripe, a piece of data uniformly across disks (typically in units called chunks), so that a large request can be served by multiple disks in parallel. A RAID system reduces the time to transfer data, while the disk in a RAID that incurs the longest seek and rotation delays can slow down the completion of a striped request.