Scheduling computing jobs (e.g., storage backup jobs) can be done when a system is “offline”. Alternatively, scheduling of such jobs can be done when a system is online if there are sufficient resources available to perform the scheduling in an online fashion. In some computing environments, many jobs (e.g., backups, snapshots) might be scheduled within tight intervals (e.g., hourly, or many per hour). The scope of a backup operation as well as a frequency and/or performance level that backup jobs will complete on time can be defined in aspects of a service level agreement (SLA), which aspects in turn can be codified using measurable quantities to describe quantitative service levels. When backup jobs are run during online operation of a system being backed up, the mere action of running the backup jobs consume system resources. In many cases such resources (e.g., CPU cycles, network I/O (input/output or IO), storage space, etc.) can be consumed by backup jobs without negatively impacting the user jobs (or other existing load processing) that are performed during regular online operation of the computing platform. Given resource usage quotas or limits, backup jobs can be scheduled on top of existing load processing. In many situations the resource demands of existing load processing can be measured, and in many situations the resource demands of backup jobs can be measured. Future predictions based on past performance can be made by a predictor, and new incoming backup jobs can be scheduled over existing loads such that the backup job(s) are provided with sufficient resources (e.g., CPU time, I/O bandwidth, wall clock time, etc.) to be able to finish by a deadline (e.g., as may be specified or derived from a service level agreement).
Unfortunately, legacy forms of scheduling (e.g., just-in-time scheduling, earliest start scheduling, token-based scheduling) are very frequently wrong. For example, the future predictions might be wrong. Many legacy forms of scheduling are very frequently very wrong, at least inasmuch as a prediction is just a prediction and many events might occur between the time a prediction is made and the timeframe of the prediction. The situation is exacerbated when there are many backup jobs that need to be scheduled over many activities occurring in the foreground processing. What is needed is a way to determine an initial schedule of backup jobs, to allocate resources to those jobs, and then to reschedule and reallocate based on events that occur after the initial scheduling.
What is needed is a technique or techniques to improve over legacy and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.