Computer systems are constantly improving in terms of speed, reliability, and processing capability. As is known in the art, computer systems which process and store large amounts of data typically include a one or more processors in communication with a shared data storage system in which the data is stored. The data storage system may include one or more storage devices, usually of a fairly robust nature and useful for storage spanning various temporal requirements, e.g. disk drives. The one or more processors perform their respective operations using the storage system. Mass storage systems particularly those of the disk array type have centralized data as a hub of operations all driving down costs. But performance demands placed on such mass storage have increased and continue to do so.
Design objectives for mass storage systems include cost, performance, and availability. Objectives typically included are a low cost per megabyte, a high I/O performance, and high data availability. Availability is measured by the ability to access data. Often such data availability is provided by use of redundancy such as well-known mirroring techniques.
One problem encountered in the implementation of disk array data storage systems concerns optimizing the storage capacity while maintaining the desired availability and reliability of the data through redundancy. It is important to allocate as closely as possible the right amount of storage capacity without going over or under significantly because of cost and necessity but this is a complex task. It has required a great deal of skill and knowledge about computers, software applications such as databases, and the very specialized field of data storage. Such requisite abilities have long been expensive and difficult to access. There remains and probably will be an increasing demand for and corresponding scarcity of such skilled people.
Determining the size and number of disk array or other data storage system needed by a customer requires information about both space, traffic and a desired quality of service. It is not sufficient to size a solution simply based on the perceived quantity of capacity desired, such as the number of terabytes believed to be adequate.
There is a long-felt need for a computer-based tool that would allow a straight-forward non-complex way to allocate proper storage capacity while balancing cost, growth plans, workload, and performance requirements. This would be advancement in the computer arts with particular relevance in the field of data storage.
Another problem that exists is the need for a automated tool that is capable of building a highly granulated sketch or profile of IO workload data collected from work on a storage system. Although workload data may be collected by prior art systems such as the ECC Workload Analyzer available from EMC Corporation of Hopkinton, the ability to particularly identify information related to variables of interest is not available on automated systems in the art. It would be an advantage if such highly resolved profile information could be either used separately or combined with the computer based tool for allocating capacity as described above.
For example, given a data storage environment wherein several hundred storage devices, e.g. hard disk drives operate in conjunction with a storage array such as the EMC Symmetrix or EMC Clariion the IO workload generated is highly complex and difficult to analyze. It would be advantageous if it could be sorted into individual applications but since the workload is distributed across many disks no tool in the prior art is capable of making such a determination. Nevertheless, it would clearly be an advancement in the computer arts, and particularly the storage arts, if a tool was capable of identifying how many business applications are active and which devices are considered to be members of these applications. By identifying these sets of devices we are able to then save signatory profiles that can later be used for graphically and visually modeling various scenarios when considering alternative configurations, such as alternative Storage Area Network (SAN) configurations. Further if the tool could do these on a relatively automated basis, such that a high-degree of computer expertise would not be needed to use such a tool this would also be a significant advancement in the computer arts.
Further there is a need for a tool that could correlate data and create groupings of applications having similar workload characteristics and allow the user to closely control the correlation or grouping. Doing so would be an advancement of the art because of the flexibility and control ability it gives to a user such as an administrator, planner, or designer. Further it would be an advancement of the art if such a tool were relatively automated or computerized so that a high degree of computer expertise would not be needed to use such a tool.