The deployment of distributed computing and storage systems continues to increase as users seek the resource usage efficiencies offered by such systems. Specifically, for example, certain components of a distributed computing system can coordinate to efficiently use a set of computational or compute resources, while certain components of a distributed storage system can coordinate to efficiently use a set of data storage resources or facilities. A hyperconverged distributed system coordinates efficient use of compute and storage resources by and between the components of the distributed system. Distributed systems that support virtualized entities to facilitate efficient resource utilization can be referred to as distributed virtualization systems. For example, a distributed virtualization system might include virtual machines (VMs) to improve the utilization of computing resources. Such VMs can be characterized as software-based computing “machines” implemented in a hypervisor-assisted virtualization environment that emulates the underlying hardware resources (e.g., CPU resources, memory resources, networking resources, etc.).
For example, multiple VMs can operate on one physical machine (e.g., host computer) running a single host operating system, while the VMs might run multiple applications on various respective guest operating systems. Another form of virtualization in modern distributed systems is operating system virtualization or container virtualization. The containers implemented in container virtualization environments comprise groups of processes and/or resources (e.g., memory, CPU, disk, etc.) that are isolated from the host computer and other containers. Such containers directly interface with the kernel of the host operating system with, in most cases, no hypervisor layer. As an example, certain applications can be implemented as containerized applications (CAs).
The use of VMs, CAs, and other virtualized entities in distributed virtualization systems to improve the utilization of system resources continues to increase. For example, some clusters in a distributed virtualization system might scale to hundreds of nodes or more that support several thousands or more autonomous VMs and/or CAs. As such, the topology and/or resource usage activity of the distributed system can be highly dynamic. Users (e.g., administrators) of such large scale, highly dynamic, distributed systems desire capabilities (e.g., management tools) that facilitate analyzing and/or managing the distributed system resources so as to satisfy not only the current but also the forthcoming demand for resources. For example, the administrators might desire capabilities that facilitate cluster management (e.g., deployment, maintenance, scaling, etc.), virtualized entity management (e.g., creation, placement, sizing, protection, migration, etc.), storage management (e.g., allocation, policy compliance, location, etc.), and/or management of other aspects pertaining to the resources of the distributed system.
Unfortunately, legacy techniques for managing resources in distributed virtualization systems can present limitations at least as pertaining to accounting for the seasonal or periodically-recurring resource usage characteristics in highly dynamic systems. For example, some techniques implement a distributed scheduling (DS) capability to allocate resources in the distributed virtualization system based at least in part on observed resource usage. Specifically, such DS techniques might determine a peak usage value and a mean usage value for a given resource usage metric (e.g., CPU usage, memory usage, input and output or I/O usage, etc.) based on a set of historical observations in a fixed window of time (e.g., prior 24 hours). The determined peak and mean usage values can then be used for comparing to the then-current observed resource usage to determine the resource allocation operation or operations (e.g., actions), if any, that can be performed to improve resource utilization.
As an example, such resource allocation operations might comprise increasing compute resource capacity at one node and/or increasing storage resource capacity at another node. However, merely using the peak and/or mean of the historical observations in the fixed time window may not accurately reflect the dynamic resource usage characteristics (e.g., periodicity, seasonality, scaling effects, etc.) of the system, resulting in misallocation of resources. For example, a resource allocation operation to move a first VM to a target node might be determined using the foregoing legacy techniques, while moving a second VM to some other target node might be a more appropriate resource allocation operation, given a set of then-current dynamic resource usage characteristics. As a more detailed example, a legacy resource allocation operation might determine to move VM123 to AccountingNode1023 based on the peaks and means that were measured in the preceding 8 hours (e.g., from March 29th), however making that determination based on only the preceding 8 hours might be shortsighted in that it does not recognize that that AccountingNode1023 is regularly (e.g., with a periodicity of about a quarter of a year) tasked to be used as a quarter-end report generator node. In this case, moving VM123 to AccountingNode1023 at that moment in time would add the load of VM123 to a node that is already, or predictably soon to be, heavily loaded.
What is needed is a technique or techniques to improve over legacy techniques and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.