The management of data processing resources with respect to data-processing clusters, e.g., of the type provided by data centers and cloud-based services, has expanded and is being used by a growing number of organizations. Many frameworks for managing resources in data-processing clusters exist, and some of the most popular have been developed as open-source software projects, such as Apache™ Hadoop® by the Apache Software Foundation, which was inspired by Google®'s MapReduce software. The latter is a framework in which an application is broken down into numerous smaller parts. Any of these parts, which are also called fragments or blocks, can be run on any node in a data-processing cluster. Hadoop® is now perhaps the most commonly employed framework that supports the processing and storage of extremely large data sets in a distributed computing environment.
A number of cloud computing subsystems have been designed for Hadoop®, such as Apache™ Yet Another Resource Negotiator (YARN), Apache™ Pig, Apache™ Hbase, and Apache™ Phoenix to name just a few. Apache™ Phoenix is an open source, massively parallel processing, relational database engine that is based on Apache HBase. Apache™ Pig is a high-level platform for creating programs that run on Hadoop®. Apache™ Spark is another system that supports a fast engine for big data processing capable of streaming and supporting Structured Query Language (SQL), machine learning, and graph processing. And, Apache™ YARN is a major part of the second version of Hadoop®.
YARN is a general-purpose, cluster-management technology, and more specifically, a large-scale, distributed operating system for big data applications. YARN combines a central resource manager that reconciles the way applications use resources with node manager agents that monitor the processing operations of individual cluster nodes. Running on hardware clusters, Hadoop® has attracted particular interest as a staging area and data store for large volumes of structured and unstructured data intended for use in analytics applications. Separating resource management functionality of Hadoop® as YARN makes the Hadoop® environment more suitable for operational applications generally. YARN further enables Hadoop® clusters to run Apache™ Spark and other types of distributed applications.