The present disclosure relates generally to storage systems and more specifically to a methodology for ensuring that sufficient space is available for data transfers on destination storage nodes in a distributed storage environment.
In a large-scale distributed storage system, individual storage nodes will commonly fail or become unavailable from time to time. Therefore, storage systems typically implement some type of recovery scheme for recovering data that has been lost, degraded or otherwise compromised due to node failure or otherwise. One such scheme is known as erasure coding. Erasure coding generally involves the creation of codes used to introduce data redundancies (also called “parity data”) that is stored along with original data (also referred to as “systematic data”), to thereby encode the data in a prescribed manner. If any systematic data or parity data becomes compromised, such data can be recovered through a series of mathematical calculations.
Erasure coding for a storage system involves algorithmically splitting a data file of size M into X chunks (also referred to as “fragments”), each of the same size MIX. An erasure code is applied to each of the X chunks to form A encoded chunks, which again each have the size MIX. The effective size of the data is A*M/X, which means the original data file M has been expanded by (A−X)*(M/X), with the condition that A≥X Now, any X chunks of the available A encoded chunks can be used to recreate the original data file M. The erasure code applied to the data is denoted as (n, k), where n represents the total number of nodes across which all encoded chunks will be stored and k represents the number of systematic nodes (i.e., nodes that store only systematic data) employed. The number of parity nodes (i.e., nodes that store parity data) is thus n−k=r. Erasure codes following this construction are referred to as maximum distance separable (MDS), though other types of erasure codes exist.
Erasure-coded content and other content stored in a distributed data storage environment can span many volumes on many storage nodes. Operations involving content stored on such a distributed data storage environment can involve large data transfers among storage nodes. For example, successfully repairing erasure-coded content stored on some or all of a storage node or volume may involve transferring one or more large data sets from one or more volumes on source nodes to one or more volumes on destination nodes.
In some cases (e.g., a repair operation or other operation involving the transfer of a large data set), a destination node may lack sufficient space to receive an entire transferred data set. A storage node may run out of space in a storage system that is busy or is tight in terms of space. In one example, if a given data set is being transferred as part of a repair operation, the destination node may lack sufficient space to receive the data set because data from other data sources is being transferred to the destination node by other processes executed concurrently with the repair operation. Due to the lack of coordination in a decentralized system, these concurrent data transfers can deplete or otherwise reduce the available storage space on the destination node before all of the data set involved in the repair operation is transferred to the destination node. In another example, the destination node may lack sufficient space for receiving the entire data set involved in a repair operation even without concurrent data transfers depleting the available storage space at the destination node. In any of these examples, if a determination that the destination node has insufficient storage space for a data set is made after at least some of the data set has been transferred via the network, an incomplete transfer of the data set can result in wasted storage space on the destination node (e.g., portions of the incomplete data set that could have been used for other operations), wasted network bandwidth used for communicating portions of the data set to the destination node, wasted computational resources used in generating the data set, etc.