1. Field
Embodiments relate to the placement of data fragments generated by an erasure code in distributed computational devices based on a duplication factor.
2. Background
In a distributed file system, one or more central servers may store files that may be accessed, with proper authorization rights, by any number of remote clients in the network. just as an operating system organizes files in a hierarchical file management system, the distributed system may employ a uniform naming convention and a mapping scheme to keep track of locations where the files are located. When the client device retrieves a file from the server, the file appears as a regular file on the client machine, and the user is able to use the file in the same way as if it were stored locally. When the user completes usage of the file, the updated file is returned over the network to the server, and the server stores the updated file for retrieval at a later time. Distributed file systems may be advantageous because they make it easier to distribute documents to multiple clients and they provide a centralized storage system such that client machines are not using their resources to store files.
Data Deduplication is a storage mechanism in which redundant data is eliminated to significantly shrink storage requirements and improve bandwidth efficiency. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. This single copy is called as master copy and the deleted copies (secondary copies) keeps a reference pointer which points to this master copy.
“Big Data” is a term that refers to data sets so large and complex that they may have to be processed by specially designed hardware and software tools. The data sets are typically of the order of Terabyte or Exabyte in size. These data sets are created from a diverse range of sources, such as sensors that gather climate information, publicly available information such as magazines, newspapers, articles, etc. Other examples where big data is generated include purchase transaction records, web logs, medical records, military surveillance, video and image archives, and e-commerce. There is a heightened interest in Big Data as enormous amount of digital data is being created from the interaction between individuals, businesses, and government agencies. There are significant benefits in effectively identifying, accessing, filtering, analyzing and selecting parts of this data. The processing of massive amounts of Big Data a necessity for advanced storage infrastructures.