The present invention relates to data storage systems. More specifically, the present invention relates to capacity planning of data storage systems.
Configuration and management of a data storage system can be a major undertaking. Involved is evaluating whether a set of devices, a set of applications and a storage layout (i.e. an assignment of storage on devices to applications) can satisfy cost, performance (throughput and capacity), and reliability requirements.
A large enterprise system typically deals with many terabytes of data spread over a range of physical devices. The difficulties inherent in configuration and management are compounded by the sheer scale of such a system. Additionally, a high-end application (e.g., OLTP, decision support system) tends to exhibit fairly complex behavior. The question of how to distribute data over a range of storage devices while providing some sort of performance guarantees is not trivial.
The configuration and management difficulties are further compounded because the configuration of the data storage system is dynamic. After the system is initially configured, the configuration is likely to change. Applications and databases are added, new devices are added, older devices that become obsolete or defective are removed and replaced by devices having different characteristics, etc. Adding to the complexity of configuring the system is the use of network-attached storage devices along with a client""s desire to share the stored data across multiple computer systems with nearly arbitrary interconnection topologies via fiber-channel networks and other storage fabrics.
The complexity of configuration and management can lead to poor provisioning of the resources (xe2x80x9ccapacity planningxe2x80x9d). Poor capacity planning, in turn, can result in the use of more data storage devices than needed, which can needlessly add to the cost of the data storage system.
Additional problems can flow from poor capacity planning. Poor allocation of data among different devices can reduce throughput. For example, two data sets (e.g., two database tables) that are stored on the same device might be accessed at the same time. Those two data sets could compete for the same throughput resources and potentially cause a bottleneck and queuing delays.
Queuing delays arise when a storage device is in the process of servicing a request and receives additional requests. The additional requests are usually queued and will not be serviced until an outstanding request is completed by the device. Eventually, the storage device will catch up and service all of the requests that are queued. In the interim, however, response time will suffer.
It is extremely useful to be able to identify these problems. The problems can be identified by testing an initial or subsequent system configuration to determine whether its devices satisfy certain performance requirements. The performance of the storage system may be measured with a number of attributes such as capacity, speed and Quality-of-Service (xe2x80x9cQoSxe2x80x9d) guarantees.
If a device cannot satisfy the performance requirement, it might be replaced by a more capable device, data thereon might be transferred to a different device, etc. The system can be reconfigured until all of the performance requirements are met.
The testing may be performed by mathematically modeling each device of the storage system and determining whether each model satisfies the performance requirements. The most common method for predicting the performance of a group of workloads on a device is to approximate the workloads as being independent. This simplifies the testing considerably. In many cases, the workloads are treated as continuous (instead of bursty), which simplifies the testing even further. However, these approximations are often not accurate enough for practical use. For example, a group of six workloads might fit well on a single device if they are never xe2x80x9cONxe2x80x9d at the same time, even if each workload requires the entire resources of the device while ON. Moreover, real workloads are often burstyxe2x80x94they have periods of high rates of data requests (xe2x80x9cONxe2x80x9d periods) interspersed by periods of little or no activity (xe2x80x9cOFFxe2x80x9d periods). Additionally, the burst activities of different workloads are usually correlated. For example, some workloads always produce requests at the same time, while others never produce requests at the same time.
At the other extreme, it is possible to treat workload correlation in full generality, by combining groups of the workloads into a single workload having a request rate that depends upon the combined state of the individual workloads (ON or OFF). These models include the Markov Modulated Poison Process (xe2x80x9cMMPPxe2x80x9d) and Markov Modulated Fluid Flow (xe2x80x9cMMFFxe2x80x9d) models. The difficulty here is that, for a group of n workloads, each of which can be ON or OFF, the combined workload has 2n states. In most cases, assumptions (e.g., all processes are identical) are made to simplify the testing of a single device and make the testing computationally tractable. Still, the number of computations for a single test might be enormous. Moreover, it might be impractical to configure a system by performing thousands of repeated tests until an optimal configuration can be found for a given set of devices.
Complexity of the testing increases as the size of the storage system increases. Thus, for a high-end application, which typically deals with many terabytes of data spread over a range of physical devices and which support applications exhibiting complex behavior, testing of even a single configuration can be extremely slow to perform.
Significant advantages could be realized by a test that is performed quickly and efficiently. System designers are often faced with a wide choice of data storage devices (e.g., different manufacturers, sizes, access times) and options for storing data on different devices. There might be thousands of different configurations to test. A fast and efficient test would allow many different configurations to be tested and the optimal configuration to be identified.
Therefore, a need exists for such a fast and efficient test.
The present invention provides a quick and efficient test for determining whether a performance requirement is satisfied by a data storage device that is assigned a group of workloads. At least one performance requirement model for each workload in the group is assigned. Each model is an increasing function of request rate. The request rate of a given workload is approximated by a distribution process describing ON/OFF behavior of the given workload. A computer is used to evaluate at least one model in the group to determine whether the device satisfies the performance requirement.