1. Technical Field
This invention relates in general to managing a distributed file system and more particularly to dynamically adjusting the number of replicas of a file within a distributed file system according to the probability that the file will be accessed within the distributed file system.
2. Description of the Related Art
As the use of the Internet has grown, the amount of data being accessed via the Internet has also increased. With the increase in the amount of data being accessed, data storage systems have been modified to accommodate larger sets of data and to provide fast and reliable access to the data.
One type of data storage system that is implemented for accommodating larger sets of data and providing fast and reliable access to the data is a distributed file system that stores redundant copies of a file among multiple storage units, which are then accessed as a single file system. By storing redundant copies of a file among multiple storage units, the single file system may respond quickly and reliable to multiple access requests to the files stored within the distributed file system. One type of distributed file system includes a Hadoop Distributed File System (HDFS). Hadoop refers to an open-source software framework developed by the Apache Software Foundation for storage and large scale processing of data-sets on clusters of commodity hardware, where an HDFS achieves reliability, including high availability and fault tolerance, by replicating the files across multiple hosts. Commodity hardware includes computing components that are already available and when implemented within an HDFS, provides low-performance, low-cost hardware working in parallel to support fast and reliable access to data stored on the hardware.