A traditional storage array (herein also referred to as a “disk storage array”, “disk array”, or simply “array”) is a collection of hard disk drives operating together logically as a unified storage device. Storage arrays are designed to store large quantities of data. Storage arrays typically include one or more storage array processors (SPs), for handling both requests for allocation and input/output (I/O) requests. An SP is the controller for and primary interface to the storage array.
Performance of a storage array may be characterized by the array's total capacity, response time, and throughput. The capacity of a storage array is the maximum total amount of data that can be stored on the array. The response time of an array is the amount of time that it takes to read data from or write data to the array. The throughput of an array is a measure of the amount of data that can be transferred into or out of (i.e., written to or read from) the array over a given period of time.
The administrator of a storage array may desire to operate the array in a manner that maximizes throughput and minimizes response time. In general, performance of a storage array may be constrained by both physical and temporal constraints. Examples of physical constraints include bus occupancy and availability, excessive disk arm movement, and uneven distribution of load across disks. Examples of temporal constraints include bus bandwidth, bus speed, spindle rotational speed, serial versus parallel access to multiple read/write heads, and the size of data transfer buffers.
One factor that may limit the performance of a storage array is the performance of each individual storage component. For example, the read access time of a disk storage array is constrained by the access time of the disk drive from which the data is being read. Read access time may be affected by physical characteristics of the disk drive, such as the number of revolutions per minute of the spindle: the faster the spin, the less time it takes for the sector being read to come around to the read/write head. The placement of the data on the platter also affects access time, because it takes time for the arm to move to, detect, and properly orient itself over the proper track (or cylinder, for multihead/multiplatter drives). Reducing the read/write arm swing reduces the access time. Finally, the type of drive interface may have a significant impact on overall disk array storage. For example, a multihead drive that supports reads or writes on all heads in parallel will have a much greater throughput than a multihead drive that allows only one head at a time to read or write data.
Furthermore, even if a disk storage array uses the fastest disks available, the performance of the array may be unnecessarily limited if only one of those disks may be accessed at a time. In other words, performance of a storage array, whether it is an array of disks, tapes, flash drives, or other storage entities, may also be limited by system constraints, such the number of data transfer buses available in the system and the density of traffic on each bus.
Thus, to maximize performance of a storage array, the operational load should be more or less evenly distributed across all physical resources, so that each physical resource may operate at its own maximum capacity. Using a disk storage array as an example, bandwidth, and thus performance, is maximized if “all spindles are being accessed at the same time.”
Performance of a storage array may also be characterized by the total power consumption of the array. The administrator of a storage array may prefer to operate the array in a manner that minimizes power consumption (“green” mode) rather than maximizes performance (“brown” mode). Operating a large storage array in green mode may not only reduce power consumption of the array itself and its associated costs but also may have indirect benefits associated with the reduction of heat being generated by the array. For example, storage arrays typically are housed in an environmentally-controlled room or site; operating an array in green mode may reduce the heat that the air conditioning system must remove, thus lowering the cost to run the site HVAC system. Furthermore, semiconductor devices age faster in hot environments than in cold environments; a storage device, whether it is a hard disk drive, flash drive, or other, will age faster if it is mounted in a rack such that it is surrounded by other heat-generating storage devices than if it is in the same rack but surrounded by cool (e.g., idle) storage devices. Thus, operating a storage array in green mode may increase the mean time between failure for the devices in the array.
Separate from but intimately related to performance maximization is the problem of underuse of scarce physical resources. Storage arrays are typically used to provide storage space for one or more computer file systems, databases, applications, and the like. For this and other reasons, it is common for storage arrays to be logically partitioned into chunks of storage space, called logical units, or LUs. This allows a unified storage array to appear as a collection of separate file systems, network drives, and/or volumes.
Historically speaking, there is a trend toward larger operating systems, larger applications or programs, and larger file sizes. For example, the footprint of a popular operating system for personal computers has grown from megabytes of disk space to gigabytes of disk space for a basic installation. Programs and applications are getting larger as well. With the advent of the internet, it is popular to store multimedia files that are megabytes or gigabytes in size, up from the typical file size in kilobytes of a decade ago. Understanding this trend, a storage administrator is likely to provision a larger portion of storage area than is currently required for an operating system, for example, with the expectation that the space requirements will grow with upgrades, bug-fixes, and the inclusion of additional features.
The problem of underuse arises when, for example, an amount of storage space is allocated to, but not used by, an operating system, program, process, or user. In this scenario, the scarce (and probably expensive) resource—disk storage space, for example—is unused by the entity that requested its allocation and thus unavailable for use by any other entity. In many cases, the unused space cannot be simply given back. For example, a database installation may require many terabytes of storage over the long term even though only a small fraction of that space may be needed when the database is first placed into operation. In short, it is often the case that the large storage space will be eventually needed, but it is not known exactly when the entire space will be needed. In the meantime, the space lies unused and unavailable for any other use as well.
Recognizing that more storage space may be provisioned for operating systems, programs, and users than they may actually use at first, the concept of a sparsely populated or “thin” logical unit (TLU) was developed. Unlike the more traditional “fat” or fully allocated logical unit (FLU), which is created by provisioning and allocating a certain amount of storage area, a TLU is provisioned at creation but is not allocated any physical storage until the storage is actually needed. For example, physical storage space may be allocated to the TLU upon receipt of an I/O write request from a requesting entity, referred to herein as a “host”. Upon receipt of the write request from the host, the SP may then determine whether there is enough space already allocated to the TLU to store the data being written, and if not, allocate to the TLU additional storage space.
While thin logical units provide distinct advantages over fully allocated logical units (i.e., where the entire storage space requested is actually allocated and reserved for the exclusive use of the requesting entity), the manner in which the slices are allocated across physical disks can have an enormous impact on the performance of the storage array. A naïve approach to allocation of storage for sparsely populated logical units, i.e., one that does not take into consideration the underlying physical and temporal constraints of the storage array in general and of the FLU pool in particular, may fail to meet the goals of the policy, such as green or brown for example, chosen by the administrator of the storage array. For example, if the administrator desires to maximize performance—i.e., a brown policy—a storage processor using a naïve allocation method might allocate all of the slices from a single physical disk, in which case the performance of the entire array may be needlessly constrained by the single disk and thus fail to meet the performance goals of the brown policy.
Accordingly, in order to avoid potential pitfalls that are inherent in a naïve approach to allocation of physical storage in a disk array, there exists a need for an informed approach to allocation that considers the underlying physical and temporal constraints of the disk array. Specifically, there exists a need for methods, systems, and computer readable medium for allocating physical storage in a storage array.