Modern distributed computing systems have evolved to natively coordinate distributed compute, storage, networking, and/or other distributed resources in such a way that incremental scaling can be accomplished in many dimensions. For example, some clusters in a distributed computing system might deploy hundreds of nodes or more that support several thousands or more autonomous virtualized entities (e.g., VMs, containers, etc.) that are individually tasked to perform one or more of a broad range of computing workloads. In many cases, several thousand VEs might be launched (e.g., in a swarm) to perform some set of tasks, then finish and collate their results, then self-terminate. As such, the working data, configuration (e.g., topology, resource distribution, etc.), and/or other characteristics of the distributed system can be highly dynamic.
Some system configuration changes in such large scale, highly dynamic distributed computing systems are a result of resource scheduling operations executed by the system. For example, the system might migrate a certain VM from one node to another node to balance the resource usage in a cluster. Further, system administrators of such distributed computing systems will often interact with the system to specify their “intent” pertaining to the resource usage in the system. For example, a system administrator might specify an intent to instantiate 30 VMs for running a virtual desktop infrastructure (VDI) workload and 20 VMs for running an SQL server workload. The distributed system can then deploy that intent in such a way that the system resources are most efficiently utilized. In some cases, multiple system administrators (e.g., from various departments in an enterprise) might interact with the system to specify resource usage intents for various respective purposes. In these cases, the change rate of the distributed system is further increased.
As the state of the distributed computing system changes over time, the system administrator might desire access to information pertaining to the then-current state and/or previous states of the distributed system to facilitate certain actions, such as a reversion action (e.g., roll back action or roll forward action) to some selected state.
Unfortunately, legacy techniques for capturing restorable states of entire computing clusters are nonexistent or deficient. For example, some procedural tracking techniques merely record the steps and/or operations executed over time that effect the state of the system. However, such techniques fail to consider the interdependencies of steps and/or operations invoked by multiple sources. Further, these techniques fail to capture and process non-procedural aspects of the system.
Moreover, legacy techniques for controlling (e.g., restoring) the state of the resources in a computing system are nonexistent or deficient. Specifically, the aforementioned procedural techniques merely facilitate rerunning start-up scripts and or replay or reversal of the captured steps and/or operations to reach a desired state (e.g., earlier state). However, due in part to the earlier described limitations pertaining to interdependencies and/or non-procedural aspects of the state, a mere replay or reversal of steps and/or operations may result in a state that is different from the desired state. Additional manual work (e.g., working data recovery, etc.) by the system administrator may be needed to bring the state obtained by procedural techniques nearer to the desired state. What is needed is a technological solution for efficiently tracking and controlling the state of the resources in a computing system.
What is needed is a technique or techniques to improve over legacy techniques and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.