The use of virtual machines (VMs) in computing platform continues to increase. The storage-related demands of such VMs has fostered development and deployment of distributed storage systems. Today's distributed storage systems have evolved to comprise autonomous nodes that facilitate scaling to virtually any speed or capacity. In some cases, the distributed storage systems can comprise numerous nodes supporting multiple user VMs running a broad variation of applications, tasks, and/or processes. For example, in clusters that may host hundreds or thousands (or more) autonomous VMs, the storage I/O (input/output or IO) activity in the distributed storage system can be highly dynamic. With such large scale, highly dynamic distributed storage systems, certain management tasks (e.g., background tasks) may be executed to maintain a uniform and/or consistent performance level as may be demanded by a service level agreement (SLA) and/or as is expected by the users. Such management tasks might include tasks related to data replication (e.g., for disaster recovery, data protection policies, etc.), data movement (e.g., for disk balancing, information lifecycle management (ILM), etc.), data compression, and/or other processes. Execution of management tasks often improve the performance level of the system. Even though users recognize that management tasks necessarily consume cluster resources (e.g., nodes, CPU time, I/O, etc.), and even though the user of the distributed storage system might recognize the benefits facilitated by the execution of management tasks, the users do not want to experience reduced system performance.
Unfortunately, legacy techniques for scheduling maintenance tasks (e.g., to run as background tasks) in a large scale, highly dynamic distributed storage system often does impact system performance as experienced by its users. For example, legacy techniques continuously run system scans that continuously execute sets of background tasks (e.g., ILM tasks, disk balancing tasks, etc.). In this case, processing might be concurrent with user interactions with the system—even during periods of user-directed mission critical activities—resulting in an impact on performance (e.g., latency increase, sluggishness, etc.) that is observed by the user. Further, the specific set of tasks, and corresponding task schedule (e.g., launched sequence), associated with the scan might be predetermined in certain legacy approaches. Such legacy approaches can conflict with a particular user storage I/O characteristic occurring at the time the management tasks are executed. For example, a spike in user storage usage might be exacerbated by a concurrently scheduled data replication task or other storage-intensive management task. Further, a management task that is scheduled to use resources (e.g., nodes, paths, storage devices, etc.) used by one or more user VMs can impact the performance at those user VMs.
What is needed is a technique or techniques to improve over legacy and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.