Enterprise-scale backup operations for data protection purposes often involve many backup jobs or workloads from a multitude of client machines. In current data protection scenarios, a backup server generally accepts all backup sessions from data center clients that are configured on the backup server. Server parallelism refers to how many savesets a backup server will simultaneously allow to be active for backups, and backup jobs are usually kept in a backup queue for processing until the server parallelism is exhausted. A server process, such as JOBDB (job database) keeps the track of all the sessions that are active and queued. As current backup sessions are completed, the queued sessions are activated to running sessions and the backup sessions are completed. In a scenario where large numbers of backup clients are configured and more data sets are to be backed up, the overall time spent by the backup session in queued sessions can be very high. In fact, there is often a fair chance that some queued sessions would have to wait forever and time out with an error. For example, sessions may get dropped due to network latency. If any timeout variable is set on the backup server or backup client agent then all queued up sessions will be aborted and backups of those clients will fail. There are situations where the backup server is exhausted due to overload from an excessively large number of queued sessions. The inability to handle these multiple processes leads to backup failure and vulnerability of client data.
In current backup systems, there is no intelligence between the backup server and the backup clients to regulate (or “throttle”) the incoming save processes based on the current backup server load. For example, some enterprise backup applications specify a maximum server parallelism limit (e.g., 1024 parallel backup sessions), and if the server receives more than this maximum number, it will put sessions in the queue and if the current backup sessions are taking more time than the timeout value to complete, then the queued backup sessions will get aborted. In a scenario where features like parallel savestreams are enabled, one save set itself may spawn multiple save sessions which would eventually exhaust the server parallelism if too many backup clients are so configured. Thus, for large-scale backup scenarios current backup servers may be overwhelmed thus causing backup jobs to be suspended or aborted.
What is needed, therefore, is a system and method improves the performance of large-scale backup operations by regulating backup sessions to prevent aborting backup jobs due to excessive scheduling.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation.