Data may be stored as unstructured data, for example, in files and directories in a file system. A distributed file system may store multiple copies of a file and/or directory on more than one storage server machine. The replication of the data across multiple storage server machines can help ensure that, in case of a hardware failure and/or system failure, the data should still be accessible. If a storage server machine experiences a failure, the storage server machine may be unavailable, but changes can still be made to the data on the copies of the data on the available storage server machines. The data on the storage server machine that is down may be stale, which is data that no longer is a current version of the data. When the failed storage server machine is powered back up, the changes which were made to the other copies of the data should be propagated to the failed storage server machine. The process of updating the stale data on the storage server machine may be known as “self-healing.”
Traditional self-healing solutions lock an entire file for the duration of the self-healing process. When the entire file is locked, client devices may not access the file. In some cases, the self-healing process may take a long period of time if a file is large, which may result in the client devices waiting for a long period of time before the file can be accessed. Particularly in a cloud environment, where a file may be a virtual machine image for instantiating a virtual machine instance in a cloud, traditional self-healing solutions may cause timeouts and virtual machine instances to hang.