Companies looking to reduce the high cost of storage often aggregate data onto shared virtualized storage systems, reducing infrastructure and management overhead of storage systems. Although this technology has proven to be useful, it would be desirable to present additional improvements. Aggregating data onto shared virtualized storage systems can lead to unexpected interference between applications with potentially divergent performance requirements. For example, one user may be running a media player with deadlines when another user starts a storage-intensive file indexer. If the two users share a storage device, then the storage applications compete with each other for performance resources, which may result in missing deadlines by the media player. On a larger scale, a transaction-processing application may experience performance degradation when a backup process begins. Such competition is not a rare occurrence, and likely becomes more frequent as a storage system grows and as more applications share the resources within the storage system.
One conventional approach to managing storage system resources dedicates a storage device or logical unit to an application. This approach isolates applications at the cost of complex manual configuration and inefficient resource utilization. Moreover, configurations are usually based on a snapshot of application behavior, and require new analysis as either the application requirements or the hardware infrastructure change.
A virtualized storage system therefore is required to provide assurances that the behavior of one application does not interfere with the performance of other applications. One conventional storage system manages the resources allocated to an application according to a specification of reserves and limits. A reserve specifies the amount of a resource whose availability the conventional storage system guarantees for the application. A limit restricts the additional amount of a resource that the conventional storage system provides to the application if unused resources exist. The limit can be used, for example, to ensure that housekeeping operations or backup do not use more than a certain amount of system performance, leaving the remainder of the resources for regular applications.
FIG. 5 illustrates a conventional storage system 500 comprising a storage device 505. The storage system 500 provides virtualized storage in a distributed system in the form of sessions 510 and pools 515. Sessions 510 comprise session 1, 520, session 2, 525, session 3, 530, session 4, 535, through session N, 540. Pools 515 comprise pool 1, 545, pool 2, 550, through pool M, 555. An application forms one or more of the sessions 510 to utilize resources in the storage device 505. The storage device 505 enforces isolation locally between applications that share the storage device 505. Internally, the storage system 500 places data on the storage device 505 such that the storage system 500 delivers reasonable overall performance, and reorganizes data in response to changes in the application behavior or the infrastructure.
Each storage device 505 in the storage system 500 has the following goals for managing its performance resources:
Reserve enforcement—An active application receives at least its reserve amount or reservation resource on average from the storage device 505, regardless of the behavior of any other applications.
Limit enforcement—An application receives at most its limit amount or limit resource on average from the storage device 505.
Fair sharing of additional resources—Each active application receives a fair share of any unused resources on the storage device 505.
Pools 515 represent a virtual entity that is generally associated with a single application or user of the storage device 505. Pools 515 encapsulate the reservation resources and limit resources of the storage device 505 that are used by an application. Although conventional performance resource management technology has proven to be useful, it would be desirable to present additional improvements.
Within each of the pools 515, each application may subdivide into sessions 510 resources in one of the corresponding pools 515 assigned to the application. Each of the sessions 510 is associated with an instance of an application that subdivides the resource allocation of an associated pool.
The problem of managing I/O performance resources can be divided into separable problems: how to specify allocations for pools and sessions, and how to deliver on those allocations. Delivering performance resources combines issues of soft real-time scheduling for fulfillment of reserves and of sharing extra resources fairly.
Traditional quality of service (QoS) resource allocation models support potentially additional levels of specification; for example, a reserve, a limit, and points in between. For each level, the specification sets the performance that the system is required to guarantee. Simple conventional models support only a single level and use metrics such as bandwidth to express requirements. More complex conventional models use benefit-value or utility functions to express requirements, and the system uses these functions to maximize the overall benefit or utility over all applications while ensuring that minimum levels are met. The user or application is required to specify the function, which is often difficult.
Several conventional hierarchical allocation models exist for resource management. Generalized models exist for the management of additional resources. Models also exist for CPU scheduling and network sharing. Most of these examples support arbitrary hierarchy depths.
One conventional allocation model utilizes an I/O scheduling algorithm with an arbitrary hierarchy of token buckets to provide proportional resource guarantees to applications. This conventional approach allows applications to borrow performance from other applications that are not using their share of performance, but does not address fair sharing of best-effort performance. This conventional approach further requires a priori knowledge of the actual device throughput under the current workload.
Additional conventional approaches utilize disk schedulers that support a mix of multimedia and non-multimedia applications. One such conventional system gives priority to best-effort streams, delaying real-time I/Os as long as possible without violating their requirements. Other such conventional systems implement a two-level hierarchy of schedulers for additional classes of traffic. However, these approaches require detailed information (such as their periodicities) about the application workloads. Other conventional approaches often assume that no other applications access the storage, which allows for greater optimization in algorithm design but does not provide adequate fair sharing of resources.
Other conventional approaches control other storage system characteristics, most notably response time. One such approach uses an earliest-deadline-first (EDF) scheduler that bases the deadline of an I/O on the response time requirement of its stream, with adaptive mechanisms to adjust the response time target as the offered load of the stream changes. Another such approach provides per-stream I/O rate throttling so that all streams receive specified response latencies. This approach is adaptive: a central server monitors the performance each stream is receiving and changes the acceptable rates for other streams when one stream is getting response time longer than its requirement.
Several conventional alternatives exist for sharing performance resources from storage devices, many of which are related to methods for sharing CPU cycles and network bandwidth. One conventional system supports proportional sharing of resources among additional users, and includes a hierarchical approach for defining the shares. Another conventional system gives each active I/O stream a share of resources in proportion to its weight relative to any other active streams. However, these approaches do not give each active stream its requested reserve of resources regardless of the demands of other streams.
What is needed is a performance resource management system that enforces fair sharing with reservation and limit enforcement. Conventional approaches to performance resource management perform reserve and limit enforcement. Some conventional approaches further perform fair sharing in which each application receives an equivalent amount of additional resources. What is needed is a performance resource management system that further performs fair sharing of additional resource proportionately, in accordance with a priority assigned to an application, such that a higher priority application with deadlines such as a media player receives more resources than a lower priority application such as a file indexer or backup. Such a system is needed that further treats the storage device as a “black box”, without requiring detailed modeling of the storage devices.
Thus, there is a need for a system, a computer program product, and an associated method for managing storage system performance as a resource. The need for such a solution has heretofore remained unsatisfied.