The present disclosure relates to distributed computing environments, and more specifically, to data migration in a distributed computing environment.
A Distributed File System (DFS) is a file system that allows access to files from multiple hosts via a network. A DFS makes it possible for multiple users to share the files and data from multiple machines. DFSs are often used in big data analytics, which handle the exponential growth and availability of data. Data locality aware scheduling has been widely used in big data analytics to facilitate optimal utilization of cluster resources. However, in some situations, the data locality aware scheduling may be very limited or cannot be carried out.