Data centers store large amounts of data across many different machines. Some machines store copies of data stored at other machines.
In the Apache Hadoop open-source software framework, data is distributed across several data nodes (e.g., a machine or virtual machine) in a Hadoop Distributed File System (HDFS). HDFS is a distributed, scalable, and portable filesystem written in Java for the Hadoop framework. In HDFS, various portions and copies of the data may be stored at the several data nodes. FIG. 1 depicts an environment 100 comprising a HDFS 102. The HDFS 102 has a data file that is stored in a seeder data node 104. The seeder data node 104 may distribute the data file, in whole or in part, to one or more leech data nodes such as data node A 106, data node B 108, data node C 110, additional data nodes (not shown), and/or data node N 112. The file may be distributed using a protocol such as the BitTorrent protocol or the hypertext transfer protocol (HTTP).