Distributed data may be located on a plurality of hosts and/or servers across one or more networks. Distributed data may also be located on computing clusters which may periodically add or drop nodes to a computing cluster. Backup storage locations may be located on network accessible locations remote from one or more nodes or other sources of distributed data which may be targeted for backup. Backup jobs may be scheduled for nodes or sources of distributed data which may require transmission of backup data across a Wide Area Network (WAN) and may increase network congestion and latency. Backup jobs may also be scheduled for heavily utilized nodes, heavily utilized servers, heavily utilized backup locations, poor performing nodes, poor performing servers, and/or poor performing backup locations. Such scheduling may impair performance for one or more users and/or for one or more backup jobs.
Furthermore, backup efforts may be scheduled without considering whether a distributed data source is an active node or a passive node. Such backup efforts may unnecessarily or undesirably impact performance and/or users on an active node. Backup efforts may also be scheduled for a passive node without regard to the quality or currency of data stored on a passive node.
Additionally, distributed data may contain data which may be related to other data, such as databases which are part of the same implementation (e.g., databases associated with one Microsoft Exchange Server). Other examples may include distributed data which is to be part of a backup set in an incremental backup strategy. Such data may be backed up to backup storage locations which do not contain the related data.
In view of the foregoing, it may be understood that there are significant problems and shortcomings associated with current technologies utilized for backing up distributed data.