The resource usage efficiencies offered by distributed computing and storage systems has resulted in continually increasing deployment of such systems. Specifically, for example, certain components of a distributed computing system can coordinate between themselves to efficiently use a set of computing resources and/or data storage resources or facilities. A hyperconverged distributed system coordinates efficient use of compute resources, storage resources, networking resources, and/or other resources that are consumed by and between the components of the distributed system. Users or consumers of the resources in hyperconverged distributed systems are often embodied as virtualized entities (VEs). The VEs in hyperconverged distributed systems might be virtual machines (VMs) and/or containers, in full or hypervisor-assisted virtualization environments and/or operating system virtualization environments, respectively. Any of the foregoing VEs can be implemented in hyperconverged distributed systems to facilitate execution of one or more workloads. For example, a VM might be created to operate as an SQL server, while another VM might be created to support a virtual desktop infrastructure (VDI).
The configuration of the components comprising a hyperconverged distributed system and/or the workloads running on such systems can be highly dynamic. Configurations can vary among users of the systems, or within a particular system over time. For example, the node appliances underlying the virtualized entities can vary in number, CPU capacity (e.g., 16 cores, 24 cores, etc.), memory capacity (e.g., 128 GB, 256 GB, etc.), storage capacity (e.g., 480 GB SSD, 1 TB HDD, etc.), or network connectivity. Further, the combination and/or schedule of workloads running on the VEs can vary over time. Predictive models are often implemented to predict the performance of the VEs and/or workloads to facilitate efficient scheduling of resources across the hyperconverged distributed system. Such a resource performance predictive model might predict an aggregate of storage input and output (IO or I/O) performance characteristics (e.g., storage IO latency metrics, storage command response latency metrics, other storage IO parameters, etc.) corresponding to various workload scheduling scenarios to facilitate selecting an appropriate resource or workload allocation scenario (e.g., a scenario with the lowest storage IO latency) for deployment in the system.
Unfortunately, legacy techniques for implementing resource performance predictive models in hyperconverged distributed systems present limitations at least as pertaining to the severity and frequency of prediction errors during an initial learning phase of the models. For example, some legacy techniques deploy untrained resource performance predictive models when installing a target system. With such techniques, the model will initially exhibit large errors in its predictions. The initial training period might be long, especially if the model was trained in a computing environment that is different from the target system environment, and/or if the model was trained using stimulus and response that is different from the stimulus and response as seen in the environment of the target system.
What is needed is a technique or techniques to improve over legacy techniques and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.