Computer storage systems shared among multiple applications should balance conflicting demands for allocating resources such as disk arms, array controllers, and the storage area network. This problem is compounded when applications from multiple clients share the same system, as in the case of a Storage Services Provider (SSP). Each client wants predictable performance, capacity, and reliability for his applications regardless of other users and of which loads they are imposing on the system.
In an exemplary model, customers establish Service Level Agreements (SLAs) with SSPs. SLAs prescribe the quality of service for each stream, i.e., for each set of accesses whose performance should be guaranteed and isolated from the behaviors of other streams. A typical SLA specifies maximum bounds on the amount of load the client is expected to impose on the system, and minimum guarantees on the level of service that the client must receive. An exemplary way of quantifying load is bandwidth, the number of accesses per time unit that each client may request from the storage system. An exemplary way of quantifying level of service is latency (or, equivalently, service time). The latency of a given access refers to the time elapsed from when the client initiates the access, until the moment when the client learns that the access has completed. A read access is completed by receiving the requested data. A write access is completed by receiving a notification of completion. Given that storage system resources are finite, a client may consume too many resources and “starve” other clients, i.e., prevent other clients from having storage system resource access for which they have contracted. Due to the high variability of realistic storage workloads and to technological limitations (in particular, the large variations on service times at storage devices and their strongly nonlinear behavior), guarantees are usually statistical; they describe the system's average responses over a period of time (e.g., several hours.)
Most importantly, nothing prevents one application from consuming more than its share of resources, even in an over-provisioned system. Consequently, different customers may in general not be performance-isolated from one another. Some mechanisms may be utilized to alleviate this problem. For example, some streams may be throttled, i.e., their accesses may be artificially delayed in the storage system instead of being considered for service as soon as they are issued by the client. By throttling a set of carefully-chosen streams, the load on shared resources may go down, and the performance experienced by some non-throttled streams may in general improve.
Several existing approaches address similar versions of this problem, in both the storage and networking domains. An extreme approach is separate allocation, where each client receives its own set of hosts and storage devices, physically unrelated to the ones allocated to other clients. This approach is wasteful because spare resources may be available in one part of the system while other parts starve. In addition, the separate allocation approach (shared-nothing system) is difficult and expensive to implement in practice because it requires totally disjoint storage area networks. The separate allocation approach leads to space allocation problems as well; hardware belonging to a given client may have to be physically contiguous, making future system growth very difficult.
An intermediate and widely followed approach is over-provisioning, where clients may share hardware, but the total amount of resources in the system is several times higher than the expected requirements of the workloads. Three-fold margins of error in system design are not unseen; big corporations that can afford to hire experienced administrators can reduce this to a factor or two by careful planning and monitoring. The economic disadvantage should be obvious for systems costing in the order of tens of millions of dollars. Over-provisioning still requires extensive monitoring to detect when resources are almost exhausted and consequently take appropriate action; humans are the highest expense for high-end storage systems. The over-provisioning approach is only well suited for coarse, infrequent allocation or reallocation decisions; humans cannot react to sudden spikes in the workload.
Even if humans attempt to respond to sudden spikes in the workload, they may not always make the right choices in the first try. Over-provisioning does little to solve the lack of performance isolation in a shared system; the problem may be ameliorated as less sharing is taking place, but there still is no limit to the amount of interference one client can cause to others.
Several solutions have been proposed. However, in some of these solutions workloads may be unnecessarily throttled even in an underutilized system. Even if throttling were somehow known to be warranted, the proposed solutions do not describe how to identify streams that should be throttled so that other streams begin to experience improved performance.
Other solutions deal with managing the CPU resource. They rely on their ability to model the reactions of the resource to reallocation decisions by using a simple linear model of CPU cycles. These techniques do not solve the problem of balancing conflicting demands for allocating resources, as the storage subsystem is harder to model and has much more variable behavior than the CPU. Also, this solution relies on resources that allow sampling for measurements of their performance during the recent past; this is not always a viable option.
One method provides guaranteed performance by varying the amount of cache allocated to each application. This approach assumes direct control over the shared resource (the cache), and controls the allocation of that resource only at the control point. In this approach, the control point can be totally separated from the resources being arbitrated. However, this approach relies on accurate measurements of the shared resources being available.
In yet another approach to balancing conflicting demands for allocating resources a single stream may get throttled, the migration stream. This approach does not need to identify candidates for throttling. In addition, this approach must has perfect knowledge of, and total control over, the application generating the throttled stream.
Some proposed solutions describe a method for apportioning network bandwidth among different servers running on the same host. Again, the network is much easier to model and monitor than the storage subsystem. However, those solutions do not monitor how well the system is doing, providing no feedback. In addition, they require detailed descriptions from human administrators about when throttling is supposed to start, and how severe it should be. The trigger for throttling is a static value set by system administrators, with no dynamically adaptive capability.
Other proposed solutions in the networking domain perform throttling at the computer nodes originating the load. This scheme has no feedback, as each node shapes locally originated traffic following policies stored from a central repository, regardless of the status of the other nodes and of the amount of resources available in the system at each point in time. In this approach, the system has no centralized control. No attempt is made to detect and react to scenarios of over- or under-provisioning.
Integrated services and differentiated services (IntServ, DiffServ) are two industry standards for networking. They allow for queuing/delaying some network packets according to the SLAs for their source clients. In lntServ, the first packet in a stream makes resource reservations along its way through the network; if the network is over-committed, packets may be dropped or delayed. In DiffServ, edge nodes assign priorities from a small set to each incoming packet, and routers within the network follow the priorities to make dropping/queuing decisions. Dropping packets is not a viable option for storage access protocols (e.g., SCSI) as they tolerate it very poorly. The point of control is always co-located with the resource being controlled.
What is needed is a system and associated method which, assuming that the storage system has sufficient resources to satisfy all its client demands, ensure that those clients receive service that meets their SLAs. The system and method should initiate throttling whenever some stream is receiving insufficient resources and determine the severity of throttling in a dynamic, adaptive way, independent of accurate measurements of shared resources or sampling for measurements of performance. The need for such a system and associated method has heretofore remained unsatisfied.