A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on volumes as a hierarchical structure of data containers, such as files and logical units (LUs). A known type of file system is a write-anywhere file system that does not overwrite data on disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from NetApp, Inc. Sunnyvale, Calif.
The storage system may be further configured to allow many servers to access data containers stored on the storage system. In this model, the server may execute an application, such as a database application, that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each server may request the data services of the storage system by issuing access requests (read/write requests) as file-based and block-based protocol messages (in the form of packets) to the system over the network.
A plurality of storage systems may be interconnected to provide a storage system architecture configured to service many server. In some embodiments, the storage system architecture provides one or more aggregates, each aggregate comprising a set of one or more storage devices (e.g., disks). Each aggregate may store one or more storage objects, such as and one or more volumes. The aggregates may be distributed across a plurality of storage systems interconnected as a cluster. The storage objects (e.g., volumes) may be configured to store content of data containers, such as files and logical units, served by the cluster in response to multi-protocol data access requests issued by servers.
Each storage system (node) of the cluster may include (i) a storage server (referred to as a “D-blade”) adapted to service a particular aggregate or volume and (ii) a multi-protocol engine (referred to as an “N-blade”) adapted to redirect the data access requests to any storage server of the cluster. In the illustrative embodiment, the storage server of each storage system is embodied as a disk element (D-blade) and the multi-protocol engine is embodied as a network element (N-blade). The N-blade receives a multi-protocol data access request from a client, converts that access request into a cluster fabric (CF) message and redirects the message to an appropriate D-blade of the cluster.
The storage systems of the cluster may be configured to communicate with one another to act collectively to increase performance or to offset any single storage system failure within the cluster. The cluster provides data service to servers by providing access to a shared storage (comprising a set of storage devices). Typically, servers will connect with a storage system of the cluster for data-access sessions with the storage system. During a data-access session with a storage system, a server may submit access requests (read/write requests) that are received and performed by the storage system.
Each server typically executes numerous applications requiring the data services of the cluster. As such, each application may be considered a workload that is serviced by the cluster. Each workload may have zero or more specified service-level objectives (SLOs). Each SLO of a workload comprises a target value of a target SLO metric, the target value to be achieved by the cluster when servicing the workload. A target SLO metric may relate to a storage system characteristic or attribute, such as a performance or protection metric. For example, a workload may have an SLO specifying a minimum value of X (the target value) for data throughput (the SLO metric) to be achieved by the cluster when servicing the workload.
Typically, the cluster will simultaneously service numerous workloads of different types and with varying levels of service, as specified by the SLOs. In this situation, performance and protection problems may arise where since different types of workloads may typically cause substantial interference with each other. This consolidation of storage services provided by the cluster for multiple types of workloads provides significant economies of scale. However, the cluster should provide such service without violating any SLOs of any of the workloads (i.e., should achieve all SLOs of all workloads being serviced). The increasing size and complexity of modern storage clusters has made storage capacity planning and storage administration, for ensuring that all SLOs of all workloads are achieved, very difficult.
To ensure all SLOs are achieved, a monitoring procedure referred to as “MAPE” has been developed to monitor the SLOs and help determine solutions if any SLOs are violated (i.e., not achieved). As known in the art, the MAPE procedure will constantly monitor (M) each SLO and workload to determine any SLO violations, and if so, will analyze (A) and plan (P) multiple proposed solutions to help in selecting a particular solution, and then execute (E) the selected solution.
A planner engine is typically used to produce and evaluate the multiple proposed solutions to help select a particular solution to execute. Current planner engines, however, cannot receive or consider new information for producing and evaluating the multiple proposed solutions without substantial reconfiguration of the planner engine. As such, current planner engines cannot dynamically receive or consider new information when producing and evaluating the proposed solutions.
The planner engine may use an evaluation engine for evaluating each proposed solution by using various evaluation functions. The evaluation functions produce values predicted to be produced by the proposed solution for various storage system metrics. These evaluation values may be used to evaluate the proposed solution, for example, by a system administrator. Typically the evaluation engine produces evaluation values for each proposed solution that may be difficult to analyze for determining the desirability of each proposed solution. Also, typically the evaluation engine is configured to use particular evaluation functions to produce values for particular metrics and the planner engine is configured to receive values for the particular metrics and process them accordingly. As such, the evaluation functions used by the evaluation engine are typically static and difficult to modify as the evaluation engine and the planner engine would both need to be heavily modified to change the evaluation functions.
In turn, the evaluation engine may use a modeling engine for producing predicted values of system metrics that are specified in the evaluation functions. The modeling engine may predict these values based on modeling the proposed solution as hypothetically implemented in the cluster storage system. Due to the increasing complexity and number of factors involved in a cluster storage system, however, current modeling engines produce significant error in predicting these system metric values.
Intelligently considering proposed solutions and accurately predicting the results of a proposed solution prior to actual implementation (execution) of the proposed solution is of high importance due to the substantial amount of time and resources needed to reverse a proposed solution that does not achieve the intended results. As such, an effective method for considering proposed solutions and accurately predicting results of each proposed solution is needed.