The Hadoop 2.x - Running a YARN Job (1)
In last blog post, a hadoop distribution is built to run a YARN job.
1 | $ bin/hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar \ |
The date -u command is executed in Hadoop cluster by
above script, we might conclude that there exists a dispatcher named
Client in
"hadoop-yarn-applications-distributedshell-2.2.0.jar", responsible for
deploying a jar to cluster with parameters, such as shell command and
args, and notify the cluster to execute the shell command.
To see what's in the rabbit hole, let's step into the
Client source code.
Code snippets will be full of this post, to not confuse you, all
comments added by me begin with //** instead of
// or /* and the code can be cloned from Apache Git
Repository, commit id is
2e01e27e5ba4ece19650484f646fac42596250ce.
The Process Logic
Since the Client is started as a process, we'd better to
look into the main method first.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L164 |
There are three procedures, first, constructs a Client
instance, then initializes it, and invokes the run method
of it.
The Client Instance
There are three constructors in Client, the
main method calls the default one with no parameters.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L227 |
The default constructor creates a YarnConfiguration
instance and bypasses it to another constructor.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L194 |
Then sets
"org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster"
as appMasterClass.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.Client.java L200 |
The YarnClientImpl extends from YarnClient
which extends from AbstractService which implements from
Service, the main job of it is to control the service
life-cycle,
The YarnClientImpl is created and initialized.
Since the init method can't be found in
YarnClientImpl as well as YarnClient, the
actual init happens in AbstractService.
1 | //** org.apache.hadoop.service.AbstractService.java L151 |
It first checks whether the state isInState
STATE.INITED, if not,
enterState(STATE.INITED), and calls the
serviceInit method, when the state is successfully
transferred to STATE.INITED, notifyListeners()
is called.
1 | //** org.apache.hadoop.service.AbstractService.java L415 |
notifyListeners notifies all its listeners and global
listeners to change their states correspondingly.
But, what is the STATE?
The State Model
Let's go back to the YarnClient.createYarnClient()
method.
1 | //** org.apache.hadoop.yarn.client.api.YarnClient.java L55 |
The YarnClientImpl instance is created.
1 | //** org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.java L86 |
1 | //** org.apache.hadoop.yarn.client.api.YarnClient.java L60 |
1 | //** org.apache.hadoop.service.AbstractService.java L111 |
Bingo, that's the state model: ServiceStateModel.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L66 |
The state model is simply a name state pair, the name is the service implementation class name.
The isInState checks the state value.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L84 |
The enterState changes the state value after
checkStateTransition.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L110 |
The state transferring is checked by looking up the statemap with
current state, then return a boolean to indicate whether the state
transition is valid, if it's invalid, the
checkStateTransition throws
ServiceStateException.
1 | //** org.apache.hadoop.service.ServiceStateModel.java L125 |
Then what's in the statemap?
1 | //** org.apache.hadoop.service.ServiceStateModel.java L35 |
That's the state model we are looking for. The current state is the row index, the proposed state is the column index, the value is whether the current state can be transfered to proposed state.
The Command Initialization
Go back again, to when the YarnClientImpl is about to be
initialized, the init method of its super
AbstractService calls the serviceInit method
and the state is transfered to STATE.INITED.
YarnClientImpl has implemented the
serviceInit interface.
1 | //** org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.java L96 |
The address of Resource Manager, rmAddress is assigned
from the configuration instance.
Now the Client instance is created successfully,
init method is invoked by main to initialize
the instance.
1 | //** org.apache.hadoop.yarn.applications.distributedshell.java L244 |
The command line options is parsed, and assigned to instance variables for later usage.
The Client is ready to run.