Due to a lack of generic, accurate, dynamic, and comprehensive models for performance estimation, users often under-provision or over-provision storage systems. With multi-tenancy, virtualization, scaled, and unified storage becoming the norm in the industry, it is important to strike an optimum balance between utilization and performance. However, performance prediction for storage systems can be difficult considering that there are multiple hardware and software layers cascaded in a complex way that affect the behavior of the system. Configuration factors such as CPU, cache size, RAM size, capacity, storage backend (HDD/Flash), network cards, etc. have a significant effect on number of Input/Output Operations per Second (IOPS) that can be pushed to the system.
There is a competitive advantage for storage providers to increase utilization of their resources while maintaining performance guarantees. A storage resource can be optimally operated at the “knee of the curve” (e.g., approximately 70% of resource utilization) as a general rule of thumb. However, identifying this point in a dynamic manner can be challenging. The situation becomes more complicated for a mix of various different workloads since the response time behavior is different depending on the workload characteristics. Also, any aggressive provisioning of the storage resources can result in performance impacts hitting the bottom-line of the resource provider. To avoid such situations, storage providers often under-provision systems to be safe, which results in systems being under-utilized.
Optimum resource utilization is also crucial in cloud computing environments. Usually, cloud providers thin provision resources and need to be able to seamlessly provision containers, migrate virtual machines (VMs), and redistribute the resource pool among client applications. Therefore, it is important to estimate the actual maximum throughput that can be delivered for various different application environments.
This throughput modeling is typically done via two approaches—white-box and black-box. In white-box models, each component such as the CPU, disks, network utilization, and memory are individually modeled by applying queueing theory. For each component, the queueing delay for each IO request is computed, and these individual models are combined to obtain the overall response time for given IO request patterns. Black-box models model the entire system as a single black box and use machine learning techniques to predict the relationship between IO patterns and response times. White-box models are usually static in nature but highly tunable in terms of systems parameters. On the other hand, black-box models can be applied in dynamic environments but are more difficult to tune.