Hadoop's MapReduce has a single node called JobTracker that is responsible for running all jobs. However, both memory capacity and processing availability can become a problem for this node. The memory is filled up with the statistics for each of the tasks of the jobs, while the processor is busy scheduling and updating the statistics of the currently running tasks. Further, because of the single lock used in the design of the JobTracker, its parallelism is limited.
Additionally, if fair scheduling is implemented so that many clients can share a machine cluster and run their jobs in parallel on a small subset of the resources of the cluster, thousands of jobs and millions of tasks are stored in memory, thus, effectively filling up the JobTracker heap. Moreover, the fair scheduler has heavy scheduling cycles, further compounding processing problems.
Because of the problems with the memory capacity and processor availability, JobTracker has a limit to the number of tasks it can hold in memory. Beyond this limit, the fair scheduling iterations become too expensive and tasks are finished slower than they enter into the system, thus, further slowing down the JobTracker.