The distributed system framework Hadoop is an extensible open-source project developed by Apache for distributed storage and computing, in a Hadoop yarn system framework, resource management, and job scheduling and monitoring are separated into two components. Different application masters (AM) enable different applications to run in the framework, and a resource manager (RM) encapsulates resources into containers that can be scheduled for different applications and tasks. This greatly improves resource utilization.
FIG. 1 is a system architecture diagram of Hadoop YARN in the prior art. As shown in FIG. 1, network elements included in the system architecture and their functions are as follows: A resource manager RM is configured for resource management and application management, allocating resources to running applications according to a capacity, a queue, or other constraints; a node manager (NM) serves as a framework agent on each node, and is responsible for starting a container required by an application, monitoring resource statuses, and reporting the statuses to the RM; an application master AM is configured to manage an application program corresponding to the AM, acquire proper containers from the RM for task execution, and track statuses and progress of these containers, where the container refers to an encapsulated machine resource (CRU, memory, or the like) and the resource encapsulated in the container is used for task execution; and a client is configured to submit a job and query an execution status. First, the client submits a job, the resource manager allocates, according to resources registered by the node manager, an encapsulated resource, namely, a container to the job and starts an AM for the job, and then the AM requests the encapsulated resource from the RM and starts the container to execute a task of the job.
A problem in the prior art is that, each time a client submits a job, the client needs to request a resource from an RM dynamically and starts an independent AM, resulting in increased latency for starting the job, which brings great impact on some jobs (such as small jobs and real-time jobs).