1. Technical Field
This disclosure relates generally to processing of large data files in a data processing system and more specifically to parallel processing of large data files on Distributed File Systems (DFS) using dynamic workload balancing in the data processing system.
2. Description of the Related Art
An increasing interest in the data processing fields of Big Data and business analytics typically requires the use of efficient methods for reading and processing of large data files stored on Distributed File Systems. Optimized methods to improve the efficiency of reading and processing large data files is an important task and focus of recent developments in cloud computing and Big Data applications. A current simple explanation of big data may be “Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications.” as defined at wikipedia.org. Business analytics, in comparison with business intelligence, is also defined at wikipedia.org as “Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. In contrast, business intelligence traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods.”
Cloud computing is a model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service.