The Old MapReduce
The Hadoop 0.x MapReduce system composed of JobTracker and TaskTrackers.
The JobTracker is responsible for resource management, tracking resource usage and job life-cycle management, e.g. scheduling job tasks, tracking progress, providing fault-tolerance for tasks.
The TaskTracker is the per-node slave for JobTracker, takes orders from the JobTracker to launch or tear-down tasks, and provides task status information to the JobTracker periodically.
For those years, we are benefited from the MapReduce framework, it’s the most successful programming model in the big data world.
But MapReduce is not everything, we need to do graph processing, or real-time stream processing, since Hadoop is essentially batch oriented, we have to look for other systems to do those work.
And the hadoop community made a huge change.
The Hadoop YARN
The fundamental idea of YARN is to split up the two major responsibilities of the JobTracker i.e. resource management and job scheduling/monitoring, into separate daemons: a global ResourceManager (RM) and per-application ApplicationMaster (AM).
The ResourceManager is responsible for allocating resources to the running applications.
The NodeManager is a per-machine slave, works on launching the application’s containers, monitoring the resource usage, and reporting them to the ResourceManager.
The ApplicationMaster is a per-application framework, which runs as a normal container, responsible for negotiating appropriate resource containers from ResourceManager, tracking their status and monitoring for progress.
Click here to read the details about Hadoop YARN.
The Hadoop Distribution
My intention is to read the source code of hadoop, so I prefer to build a hadoop distribution from the source code.
Second, checkout the branch-2.2.0 branch, which is quite stable.
Third, apply below patch,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Fourth, type and run.
mvn package -Pdist -DskipTests -Dtar
The Installation Guide
This is a great guide to install Hadoop 2.2.0.
I did a single installation, after configuring everything, hdfs can be setup and daemons are started by below scripts.
1 2 3 4 5
date -u commands on two containers.
The logs are printed:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
And the results are:
1 2 3
That’s my first YARN job running!