Hadoop RPC – Client (1)
In last post, I metioned that all RPC client calls will be dispatched
to Invoker.invoke()
method by the dynamic proxy, mainly
work of method invoke
are done by
client.call
.
1 | //** org.apache.hadoop.ipc.RPC.java L218 |
There are two parameters in method client.call
, the
first one is an instance of Invocation
, the second one is
remoteId, which is easy to guess that it might be used to represent the
connection between client and server.
Before step into client.call
to see what are done there,
it’s better to know what the Invocation
is.
Invocation
implements Writable
interface,
so, let’s look at Writable
first.
1 | //** org.apache.hadoop.io.Writable.java L61 |
But Java has it's own seriablization mechanism, why not use it? Doug Cutting said in response to that question:
Why didn’t I use Serialization when we first started Hadoop? Because it looked big and hairy and I thought we needed something lean and mean, where we had precise control over exactly how objects are written and read, since that is central to Hadoop. With Serialization you can get some control, but you have to fight for it.
The logic for not using RMI was similar. Effective, high-performance inter-process communications are critical to Hadoop. I felt like we’d need to precisely control how things like connections, timeouts and buffers are handled, and RMI gives you little control over those.
Invocation
must implements two methods,
readFields
and write
.
1 | //** org.apache.hadoop.ipc.RPC.java L75 |
ObjectWritable
is a polymorphic Writable that writes an
instance with it’s class name, handles arrays, strings and primitive
types without a Writable wrapper.
In static method ObjectWritable.writeObject
,
UTF8.writeString(out, parameterClass.getName())
is used to
get the class type of the parameter firstly, then, iteratively check the
type of parameters, if it’s an array, recursive call this method, if
it’s a string, UTF8.writeString(out, (String)instance)
is
used to set the instance, if it’s a primitive type,
out.writeChar
out.writeInt
, etc. is used to
set the instance.
The static method ObjectWritable.readObject
is the
opposite operation of ObjectWritable.writeObject
, it reads
parameter class type and parameter instance from in
which
is written by ObjectWritable.write
.
Method ObjectWritable.getDeclaredClass
gets the
parameter class type which is generated by
ObjectWritable.readObject
.
This is how are the method name and parameters passed from client to
server, they are sent from client to server through networking by a
"Writable" object of type Invocation
, which is serialized
in client and deserialized in server.
Then, let’s step into the method client.call
which I
mentioned at the beginning of this post.
1 | //** org.apache.hadoop.ipc.Client.java L1075 |
We can see that four classes contribute to the client of Hadoop RPC,
which are Client
, Call
(which has one
inheritant ParallelCall
), Connection
, and
ConnectionId
, the last four are inner classes of
Client
.
We'll disucssed them separately in next post.