In last blog post, a hadoop distribution is built to run a YARN job.
$ bin/hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar \
date -u command is executed in Hadoop cluster by
above script, we might conclude that there exists a dispatcher named
"hadoop-yarn-applications-distributedshell-2.2.0.jar", responsible for
deploying a jar to cluster with parameters, such as shell command and
args, and notify the cluster to execute the shell command.
To see what's in the rabbit hole, let's step into the
Client source code.
Code snippets will be full of this post, to not confuse you, all
comments added by me begin with
//** instead of
/* and the code can be cloned from Apache Git
Repository, commit id is
The Process Logic
Client is started as a process, we'd better to
look into the
main method first.
//** org.apache.hadoop.yarn.applications.distributedshell.Client.java L164
There are three procedures, first, constructs a
instance, then initializes it, and invokes the
The Client Instance
There are three constructors in
main method calls the default one with no parameters.
//** org.apache.hadoop.yarn.applications.distributedshell.Client.java L227
The default constructor creates a
instance and bypasses it to another constructor.
//** org.apache.hadoop.yarn.applications.distributedshell.Client.java L194
//** org.apache.hadoop.yarn.applications.distributedshell.Client.java L200
YarnClientImpl extends from
which extends from
AbstractService which implements from
Service, the main job of it is to control the service
YarnClientImpl is created and initialized.
init method can't be found in
YarnClientImpl as well as
init happens in
//** org.apache.hadoop.service.AbstractService.java L151
It first checks whether the state
STATE.INITED, if not,
enterState(STATE.INITED), and calls the
serviceInit method, when the state is successfully
//** org.apache.hadoop.service.AbstractService.java L415
notifyListeners notifies all its listeners and global
listeners to change their states correspondingly.
But, what is the STATE?
The State Model
Let's go back to the
//** org.apache.hadoop.yarn.client.api.YarnClient.java L55
YarnClientImpl instance is created.
//** org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.java L86
//** org.apache.hadoop.yarn.client.api.YarnClient.java L60
//** org.apache.hadoop.service.AbstractService.java L111
Bingo, that's the state model:
//** org.apache.hadoop.service.ServiceStateModel.java L66
The state model is simply a name state pair, the name is the service implementation class name.
isInState checks the state value.
//** org.apache.hadoop.service.ServiceStateModel.java L84
enterState changes the state value after
//** org.apache.hadoop.service.ServiceStateModel.java L110
The state transferring is checked by looking up the statemap with
current state, then return a boolean to indicate whether the state
transition is valid, if it's invalid, the
//** org.apache.hadoop.service.ServiceStateModel.java L125
Then what's in the statemap?
//** org.apache.hadoop.service.ServiceStateModel.java L35
That's the state model we are looking for. The current state is the row index, the proposed state is the column index, the value is whether the current state can be transfered to proposed state.
The Command Initialization
Go back again, to when the
YarnClientImpl is about to be
init method of its super
AbstractService calls the
and the state is transfered to
YarnClientImpl has implemented the
//** org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.java L96
The address of Resource Manager,
rmAddress is assigned
from the configuration instance.
Client instance is created successfully,
init method is invoked by
main to initialize
//** org.apache.hadoop.yarn.applications.distributedshell.java L244
The command line options is parsed, and assigned to instance variables for later usage.
Client is ready to