Distributed cluster environments often include compute nodes that execute applications and data nodes that replicate data from the compute nodes. For example, an OpenFlame Compute cluster may include various compute nodes that execute applications independently of one another. In this example, the compute nodes may replicate data to the data node to facilitate local storage reclamation, performance increases, and/or data restoration. These compute nodes may write and/or replicate data to the data node at different speeds. As a result, the data node may end up dedicating more resources and/or time to fulfilling the needs of some compute nodes and less resources and/or time to fulfilling the needs of others.
Unfortunately, the needs of some compute nodes may be more urgent and/or have higher priority than others. In traditional distributed cluster environments, the compute nodes may be responsible for scheduling, initiating, and/or performing their own data replication processes with the data node. Despite some compute nodes having more urgent replication needs than others, these compute nodes may be unable to communicate their needs and/or priorities to one another in traditional distributed cluster environments. As a result, these compute nodes may be unable to coordinate any type of priority-based scheduling of data replication processes with one another.
As an example, an OpenFlame Compute cluster may include a compute node that executes a first virtual machine. In this example, data used by the first virtual machine may be currently consuming approximately 80% of the storage space on the compute node's Solid-State Drive (SSD). This OpenFlame Compute cluster may also include another compute node that executes a second virtual machine. In this example, data used by the other virtual machine may be currently consuming approximately 10% of the storage space on the other compute node's SSD.
Unfortunately, in the event that the data used by one of these virtual machines fills up the corresponding SSD to capacity, that SSD may experience a storage overflow, thereby potentially causing data corruption, Service-Level Agreement (SLA) violations, and/or application downtime. In view of this possibility, the first virtual machine whose data is consuming approximately 80% of its SSD may have a greater need to perform data replication with the data node than the second virtual machine whose data is consuming only approximately 10% of its SSD. Nevertheless, traditional distributed cluster environments may fail to provide any means for coordinating and/or performing such priority-based scheduling of data replication processes.
The instant disclosure, therefore, identifies and addresses a need for improved systems and methods for performing data replication in distributed cluster environments.