Distributed systems allow multiple clients in a network to access a pool of shared resources. For example, a distributed storage system allows a cluster of host computers to aggregate local disks (e.g., Solid State Drive (SSD), Peripheral Component Interconnect (PCI) based flash storage, Serial Advanced Technology Attachment (SATA), or Serial Attached Small Computer System Interface (SAS) magnetic disks) located in or attached to each host computer to create a single and shared pool of storage. This pool of storage (sometimes referred to herein as an “object store”, “datastore” or “store”) is accessible by all host computers in the cluster and may be presented as a single namespace of storage entities (such as a hierarchical file system namespace in the case of files, a flat namespace of unique identifiers in the case of objects, etc.). Storage clients such as virtual machines spawned on the host computers may use the aggregate object store, for example, to store virtual disks that are accessed by the virtual machines during their operation. Because the shared local disks that make up the object store may have different performance characteristics (e.g., capacity, input/output per second (IOPS) capabilities, etc.), usage of such shared local disks to store virtual disks or portions thereof may be distributed among the virtual machines based on the needs of each given virtual machine.
Providers of distributed storage systems must balance the heavy demands of availability, performance, reliability, and cost. Distributed replication and erasure coding are used to provide for the recovery of data in the event of storage device failure or other system failures. Erasure coding is a method of data protection in which data is broken into fragments or portions, expanded and encoded with redundant data pieces and stored across a set of different locations, e.g., storage devices in different geographic locations. Erasure coding creates a mathematical function (e.g., polynomial interpolation or oversampling) to describe a set of numbers representing a portion of data so they can be checked for accuracy and recovered if one of the numbers is lost. Erasure coding can be represented in simple form by the following equation: K=N+M. The variable “N” is the original number of portions of data. The variable “M” stands for extra or redundant portions of data that are added to provide protection from failures. The variable “K” is the total number of portions of data created after the erasure coding process. For example, in a 10 of 16 configuration, 6 extra portions of data (M) are added to the 10 base portions (N). The 16 data portions (K) are distributed across 16 storage devices. The 6 extra portions of data created after the erasure coding process may be referred to as code blocks, while the 10 base portions of data may be referred to as data blocks. In the event of data loss or a lost connection to one or more storage devices, the original data can be reconstructed using any 10 of the 16 blocks.
When a change is made to one of the data blocks (e.g., one of the N base portions in the example above), one or more of the code blocks (e.g., the M extra portions in the example above) may need to be recalculated to reflect the change. This ensures that the latest version of the data can still be reconstructed using any N of the K blocks. Recalculating a code block using erasure coding requires a certain number of I/O operations, as it involves reading data from existing blocks and applying a mathematical function to the data. In some cases, such as when a storage location containing a data block is unavailable, it may be necessary to reconstruct the data block using any N of the blocks which are available in order to recalculate the code block. Because of this, the number of I/O operations required to recalculate the code block may increase, and the overall performance of the system may be negatively impacted. As such, there exists a need for efficient methods of handling data updates within an erasure coding system.