A data storage system allows one or more client devices (“clients”) to access (i.e., read and/or write) data on the data storage system through a host device (“host”), such as a storage server, which is physically separate from the client. The clients typically communicate with the host over a network, such as a local area network, wide area network, virtual private network, or point-to-point connection. The host typically is connected to one or more storage devices directly or over a network, such as a storage area network (SAN). A storage device can be, for example, a disk, a tape, a solid-state memory storage device, or an array of disks or tapes. Some data storage systems include two or more hosts. Multiple hosts can be clustered such that two or more hosts are connected to each storage device for increased fault tolerance.
As shown in FIG. 1, an illustrative environment 100 includes clients 110(a)-110(c) that are connected to hosts 130(a)-130(b) through a network 120. The clients 110(a)-110(c) communicate requests for services to the hosts 130(a)-130(b) over the network 120 and receive results back from the hosts 130(a)-130(b) over the network 120. The clients 110(a)-110(c) typically are computers such as general-purpose computers or application servers. The number of clients can range from one to an arbitrarily large number. The hosts 130(a)-130(b) are capable of accessing data stored on storage devices 150(a)-150(b), which can be, for example, one or more Redundant Arrays of Independent (or Inexpensive) Disks (RAID). The hosts 130(a)-130(b) may also be capable of storing and managing shared files in a set of storage devices 150(a)-150(b) on behalf of one or more clients 110(a)-110(c). Examples of hosts 130(a)-130(b) are network storage servers that are available from Network Appliance, Inc., of Sunnyvale, Calif. The hosts 130(a)-130(b) are connected to the storage devices 150(a)-150(b). Any suitable connection technology can be used, e.g., Fibre Channel, SCSI, etc.
One configuration in which file servers can be used is a network attached storage (NAS) configuration. In a NAS configuration, a file server can be implemented in the form of an appliance that attaches to a network, such as a local area network (LAN) or a corporate intranet. An example of such an appliance is any of the Filer products made by Network Appliance, Inc. in Sunnyvale, Calif.
Another specialized type of network is a storage area network (SAN). A SAN is a highly efficient network of interconnected, shared storage devices. Such devices are also made by Network Appliance, Inc. One difference between NAS and SAN is that in a SAN, the storage appliance provides a remote host (e.g., storage server) with block-level access to stored data, whereas in a NAS configuration, the file server normally provides clients with file-level access to stored data.
The number of hosts can range from one to an arbitrarily large number, and the number of storage arrays likewise can range from one to an arbitrarily large number. The hosts 130(a)-130(b) typically are storage servers that can be clustered for increased fault-tolerance. The storage arrays 150(a)-150(b) typically are arrays of magnetic disks, such as Fibre Channel or SCSI disks, contained in one or more shelves. The combination of the hosts 130(a)-130(b) and the storage arrays 150(a)-150(b) forms a data storage system. The data storage system can use a RAID design and RAID protocols for the storage arrays 150(a)-150(b), which helps protect against data loss in the event of a disk failure. In a RAID-4 system, for example, data is striped across multiple disks and is protected by parity information. If a disk in the array of disks fails, the parity information is used to recover the lost data from the failed disk.
A system architect or administrator has many choices to make when designing, expanding, or reconfiguring a data storage system such as that show in FIG. 1. The system architect or administrator may make changes in designing the data storage system when deploying new resources for the data storage system, moving to a new vendor of resources, or the like. The architect may expand the data storage system when there is a growth in number of employees or customers, and consequently the number of resources needed, an increase in the storage requirements due to new applications for the data storage system, or the like. Similarly, the system architect or administrator may reconfigure the data storage system for similar reasons described above with designing and expanding. Such an architect or administrator will generally use an application program such as a sizing tool or a capacity planning tool for this purpose.
Software sizing tools are computer program applications that help a system architect or administrator decide how much data storage a given computer system requires based on such variables as the number of users the computer system has and how that storage should be structured (e.g., how many hosts and what type of reliability options should be used). Software sizing tools may also help a system architect or administrator decide which physical resources are needed for performance requirements, for example, performance requirements in terms of throughput (e.g., MegaBytes read or written per second), response times of the resources, or the like. In addition, the sizing tool help the system architect or administrator decide how many storage servers, what type of storage servers, how many storage devices, what types of storage devices, and the like. Software capacity planning tools, which can be part of or separate from sizing tools, are computer program applications that allow the system architect to analyze the performance of various configurations of data storage equipment. A conventional software sizing or capacity planning tool typically must be replaced by a newer version of the tool when new hardware (e.g., a new type of disk array) becomes available. When multiple sizing or capacity planning tools are used in conjunction (e.g., a tool for sizing a storage system for a database application program and a tool for sizing the storage system for an e-mail application program), inconsistent results can occur, especially when the tools use different underlying models that calculate suggested configurations based on the inputs. Using multiple sizing or capacity planning tools also is cumbersome because each tool typically has a different input format for the system requirements and recommendations are output in different formats as well.
Even though conventional sizing tools help system architects or administrators decide how much data storage a given computer system requires based on the variables described above, conventional software sizing tools do not help system architects or administrators decide how to layout the data storage system. For example, current sizing tools determine the amount and type of physical resources (e.g., the number and/or type of storage servers and storage devices) that would support the provided workload groups, but they do not provide any recommendation or guidance on how to layout the workload groups on those physical resources and where to place the workload groups to ensure an even utilization of resources for a balanced deployment. A workload group is a set of one or more variables to be used by the sizing infrastructure module and layout planning recommendation module in determining the layout configuration of the data storage system. The workload groups may be input by the user, or alternatively, be input from the calling application program. The workload groups may include, for example, as described herein, a capacity requirement, a performance requirement, a reliability requirement, general configuration requirements, and/or one or more workload entities, such as logs, databases, volumes, files, aggregates, or the like. The one or more workload entities may also be, for example, the number of users the data storage system has, how that storage should be structured (e.g., how many hosts and what type of reliability options should be used), a respective associated required throughput that is included in the performance requirement of the workload group, or the like.