Distributed file systems may provide an application programming interface (API) to users (e.g., applications executing on server computers) that presents a unified view. Said in other words, from the viewpoint of a user, a distributed file system may appear to be a single computer system or a single memory device. Behind the API, however, the distributed file system may comprise many separate server computers each mediating access to one or more disk drives or other memory devices. A master server may maintain a map or index that associates an abstract reference to stored data with the actual physical location of the data in one or more of the disk drives connected to the file serving computers.
A distributed file system may maintain multiple copies of the same files to promote the goals of reliability (reducing risk of loss of data) and fast access to data. In some cases, computing operation may be performed on the files executing on compute resources (e.g., logical processors) close to the data. Multiple copies of files can support the option of choosing to execute compute operations on the version of the file that is connected to a compute resource with the greatest amount of currently available processor bandwidth. Over time, as the files of a distributed file system grow and/or are deleted and as new memory devices are added to the distributed file system, the data in the distributed file system may become unbalanced. Thus, performing a balancing procedure (or a rebalancing procedure) may be desirably performed occasionally to locate data in a more balanced and processing efficient configuration. Regrettably, such balancing procedures in very large distributed file systems can take a long time, as long as several days, and can slow the access to the distributed file system due to loading on the processing power of the master server.