One of the tasks undertaken by designers of computer systems, particularly data storage systems, is to design systems that are easy to use and that efficiently utilize resources. A simplified example of a data storage system may include a number of physical storage devices (e.g., hard drives) connected to a server and a number of workstations connected to the server. A workstation may send a request to the server to retrieve information stored in the storage system. Upon receiving the request, the server determines the location of the requested information and then retrieves the information from one of the hard drives. The requested information may then be sent back to the workstation.
The configuration and management of a large number of storage devices is a major undertaking and is important to the efficient operation of the storage system. The difficulties inherent in this undertaking are compounded by the sheer scale of many storage systems--tracking the thousands of physical and logical devices required to support a capacity of few tens of terabytes (TB) of data.
Designers need to consider first the problem of the initial configuration of the storage devices to meet performance and availability goals. Designers then need to consider the configuration of the storage devices with performance guarantees in view of a workload that is constantly changing, as well as a storage pool that is changing as new devices are added, as older devices become obsolete, and as defective devices are removed. These changes may be further compounded with a client's desire to share the storage across multiple computer systems with nearly arbitrary interconnection topologies via storage fabrics like fiber-channel networks. Adding to the complexity of the design process is the introduction of network-attached storage devices (NASD).
The process of designing data storage systems has reached the point where the cost of managing a device is several times the purchase price of the device itself. The planning for a medium-scale installation (e.g., a few terabytes) may take many months, representing a significant fiscal expenditure to a client. Indeed, the design and management of storage hardware has become a multibillion dollar business and represents a significant chunk of information-technology budget expenditures. Improving the way storage is used and managed would be beneficial for clients, system managers and administrators, and computer-system vendors.
Most existing approaches to storage management operate at too low a level. These conventional approaches require designers to allocate and configure disks or disk arrays to particular pieces of work but provide little help to the designers in doing so. For example, logical volume managers (LVM) provide a number of low-level mechanisms to allow multiple disks to be grouped together in different ways, but provide no additional help in predicting performance effects or in determining the data layout best to use.
The performance of a storage system may be measured with a number of attributes such as capacity and speed. The performance of a storage system may also be measured with what is known as quality-of-service guarantees. A quality-of-service guarantee for a system may be measured as a desired percentage of requests served in a desired period of time. A system designer needs to design the storage system to meet particular quality-of-service guarantees desired by a client.
For example, a client may want 95% of all requests from workstations to be served within a time frame of one second. The time frame is measured from the time a server receives a request from a workstation to the time the server transmits the information back to the workstation. In satisfying such a quality-of-service guarantee, the designer needs to design the storage system to include a sufficient number of hardware devices, such as servers and storage devices, and to configure such devices to ensure that 95% of requests are served in 1 second.
One of the delays which needs to be taken into consideration when designing a system to meet a quality-of-service guarantee is know as a queuing delay. Queuing delays may arise when a server is in the process of servicing a request from a first workstation and receives a request during that time from a second workstation. The request from the second workstation is queued and will not be serviced until the first request is completed by the server. If more than one request arrives at the server when the server is busy servicing a particular request, then a queue forms. In the long term, the server will eventually service all of the requests in the queue, but in the short term, some requests may suffer queue delays. In other words, queuing delays may cause a percentage of the workload (that is, the summation of all of the requests from all of the workstations) arriving at a device in a particular time interval not to be serviced within a desired response time.
A system designer may translate a quality-of-service guarantee of, for example, "95% of requests must be served in 0.25 second" as "the 0.95-quantile (i.e., 95.sup.th percentile) of response time in the system must not exceed 0.25 second." More generally, the p-quantile (i.e., the 100 p percentile) of the workload must not exceed t seconds. The p-quantile of the workload in a system is difficult to determine using conventional analytical methods. For systems that can be modeled relatively simply, for example, a server with first-come-first-serve queuing and Poisson arrivals, one method of finding the p-quantile is through the use of standard M/G/1 queuing analysis and numerically inverting the resulting Laplace transform of the response time distribution. For more complex arrival processes, matrix-geometric methods may be used. However, such methods are relatively expensive in terms of computation time. When capacity planning is done using automated tools, the solution may involve checking the acceptability of thousands of alternative configurations, which requires a very fast test. However, as mentioned, conventional methods require substantial computation time and are, accordingly, too slow to meet the increasing need for cost-efficient systems.
In view of the foregoing, it is apparent that there is still a need for a method of designing a data storage system in which the p-quantile of a workload can be efficiently and quickly determined, thereby simplifying and rendering more efficient the process of designing storage systems.