The Hadoop 2.x - Running a YARN Job (1)
In last blog post, a hadoop distribution is built to run a YARN job.
1 | $ bin/hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar \ |
The date -u
command is executed in Hadoop cluster by
above script, we might conclude that there exists a dispatcher named
Client
in
"hadoop-yarn-applications-distributedshell-2.2.0.jar", responsible for
deploying a jar to cluster with parameters, such as shell command and
args, and notify the cluster to execute the shell command.
To see what's in the rabbit hole, let's step into the
Client
source code.
Code snippets will be full of this post, to not confuse you, all
comments added by me begin with //**
instead of
//
or /*
and the code can be cloned from Apache Git
Repository, commit id is
2e01e27e5ba4ece19650484f646fac42596250ce
.
The Process Logic
Since the Client
is started as a process, we'd better to
look into the main
method first.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L164 |
There are three procedures, first, constructs a Client
instance, then initializes it, and invokes the run
method
of it.
The Client Instance
There are three constructors in Client
, the
main
method calls the default one with no parameters.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L227 |
The default constructor creates a YarnConfiguration
instance and bypasses it to another constructor.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L194 |
Then sets
"org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster"
as appMasterClass
.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L200 |
The YarnClientImpl
extends from YarnClient
which extends from AbstractService
which implements from
Service
, the main job of it is to control the service
life-cycle,
The YarnClientImpl
is created and initialized.
Since the init
method can't be found in
YarnClientImpl
as well as YarnClient
, the
actual init
happens in AbstractService
.
1 | //** org.apache.hadoop.service.AbstractService.java L151 |
It first checks whether the state isInState
STATE.INITED
, if not,
enterState(STATE.INITED)
, and calls the
serviceInit
method, when the state is successfully
transferred to STATE.INITED
, notifyListeners()
is called.
1 | //** org.apache.hadoop.service.AbstractService.java L415 |
notifyListeners
notifies all its listeners and global
listeners to change their states correspondingly.
But, what is the STATE?
The State Model
Let's go back to the YarnClient.createYarnClient()
method.
1 | //** org.apache.hadoop.yarn.client.api.YarnClient.java L55 |
The YarnClientImpl
instance is created.
1 | //** org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.java L86 |
1 | //** org.apache.hadoop.yarn.client.api.YarnClient.java L60 |
1 | //** org.apache.hadoop.service.AbstractService.java L111 |
Bingo, that's the state model: ServiceStateModel
.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L66 |
The state model is simply a name state pair, the name is the service implementation class name.
The isInState
checks the state value.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L84 |
The enterState
changes the state value after
checkStateTransition
.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L110 |
The state transferring is checked by looking up the statemap with
current state, then return a boolean to indicate whether the state
transition is valid, if it's invalid, the
checkStateTransition
throws
ServiceStateException
.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L125 |
Then what's in the statemap?
1 | //** org.apache.hadoop.service.ServiceStateModel.java L35 |
That's the state model we are looking for. The current state is the row index, the proposed state is the column index, the value is whether the current state can be transfered to proposed state.
The Command Initialization
Go back again, to when the YarnClientImpl
is about to be
initialized, the init
method of its super
AbstractService
calls the serviceInit
method
and the state is transfered to STATE.INITED
.
YarnClientImpl
has implemented the
serviceInit
interface.
1 | //** org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.java L96 |
The address of Resource Manager, rmAddress
is assigned
from the configuration instance.
Now the Client
instance is created successfully,
init
method is invoked by main
to initialize
the instance.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.java L244 |
The command line options is parsed, and assigned to instance variables for later usage.
The Client
is ready to run
.