1. Field of the Invention
The embodiments herein generally relate to computer storage systems, and, more particularly, to techniques for allocating resources in a computer network-based storage system.
2. Description of the Related Art
Enterprise applications typically depend on guaranteed performance from the storage subsystem, lest they fail. However, unregulated competition is generally unlikely to result in a fair, predictable apportioning of resources. Given that widespread access protocols and scheduling policies are largely best-effort, the problem of providing performance guarantees on a shared system is a very difficult one. Clients typically lack accurate information on the storage system's capabilities and on the access patterns of the workloads using it, thereby compounding the problem.
A typical consolidated storage system at the multi-petabyte level generally serves the needs of independent, paying customers (e.g., a storage service provider) or divisions within the same organization (e.g., a corporate data center). Consolidation has generally proven to be an effective remedy for the low utilizations that plague storage systems, for the expense of employing scarce system administrators, and for the dispersion of related data into unconnected islands of storage. However, the ensuing resource contention generally makes it more difficult to guarantee a portion of the shared resources to each client, regardless of whether other clients over- or under-utilize their allocations-guarantees typically required by the prevalent utility model.
The industry has identified the problem of allocating resources in a fully automated, cost-efficient way so that most clients experience predictable performance in their accesses to a shared, large-scale storage utility. Hardware costs play a dwindling role relative to managing costs in most conventional enterprise systems. However, input/output (I/O) workloads are highly bursty: the load placed on the storage subsystem can change by two orders of magnitude in a matter of milliseconds. Therefore, it is not practical to provision for the worst case-since all applications accessing the system will seldom have their peak loads at the same point in time, worst-case designs will have excessive amounts of unused resources that can cost millions of dollars to purchase and administer. This problem is compounded by inadequate available knowledge about storage device capabilities. Also, there is a tradeoff between how resource-efficient a design is, and how easy it will be for it to adapt to unforeseen (but unavoidable) circumstances in the future. In the state of the art, carefully hand-crafted static allocations generally do not contemplate hardware failures, load surges, and workload variations; system administrators must typically deal with those by hand, as part of a slow and error-prone observe-act-analyze loop. Prevalent access protocols (e.g., SCSI and FibreChannel) and resource scheduling policies are largely best-effort. Accordingly, unregulated competition is generally unlikely to result in a fair, predictable resource allocation.
Previous work on this problem includes management policies encoded as sets of rules. Fully specifying corrective actions at design time is an error-prone solution to a highly complex problem, especially if they are to cover a useful fraction of the solution space and to accommodate priorities. It is also typically difficult to determine accurate threshold values that will lead to correct decisions in different scenarios, in the absence of any solid quantitative information about the system being built. Other approaches include heuristic-based scheduling of individual I/Os and decisions based purely on feedback loops and on the predictions of models for system components. The resulting solutions are generally either not adaptive at all (as in the case of rules), or dependent on models that are too costly to develop, or ignorant of the system's performance characteristics as observed during its lifetime. Accordingly, there remains a need for an improved technique that allocates resources in a computer network in a fully automated and cost-efficient manner.