In recent years, the production of digital data has increased exponentially as more and more information is digitized; e.g., electronic devices capture data from an increasing number of sources such as smart phones, electronic sensors, and social media. “Big data” is a coined term that refers to large and complex volumes of data and the various analytics and methods used to process such data.
Information Technology (IT) and data storage environments that generate and receive growing volumes of big data seek efficient ways to manage, process, and analyze the data as it becomes available. Conventional systems and networks deploy various parallel processing tools to address big data analysis. Such systems require centralized job schedulers and/or resource allocators to manage and maintain the data. Stated differently, such systems often centralize authority to a few nodes within the network by using a master/slave architecture to coordinate allocation of resources. However, the master nodes within a master/slave architecture are burdened with and responsible for multiple roles which results in performance bottlenecks and increased reliability risk. Failure of one or more of the master nodes increases load on remaining nodes, potentially causing degradation or interruption of service in data storage and accession, new task delegation, and progress reporting. Conventional centralized systems are also susceptible to service disruptions in the event of lost slave machines which store metadata describing the locations of other data. Further, a drawback of master/slave (and broadly, hierarchical) architectures is that well-distributed data is dependent on decreasingly distributed resources at higher hierarchal levels.
The aforementioned conventional systems, with highly central control imposed by the master/slave architecture, present a variety of additional issues. Scalability is limited because conventional systems cannot expand beyond the capacity of the master node. Further, robustness is limited. Master nodes present single points of failure that must be backed up by similarly capable machines. Master nodes also incur higher hardware and administrative costs so redundancy is required in the more expensive parts of the architecture. Further still, the master/slave architecture results in additional administration expense. Such hierarchal architectures require significant configuration and monitoring to ensure the hierarchy is preserved and unexpected load accommodated. Expansion or reduction of the network requires additional configuration that creates trade-offs between network dynamicism and administrative cost.
It is with these concepts in mind, among others, that various aspects of the present disclosure were conceived.