We have a legacy system, which is a web service, receives HTTP POST from clients, parses the data, then stores them in a file.
The function of the system is simple, and people already done functional and performance test, it's stable. As time drifted away, the system was copy and paste to some projects by only changing the data parsing logic.
-- Taken on April 12, 2014 (flickr)
In last blog post, The job
Client has been created and initialized.
This post will discuss on how does
Client do to deploy the job to hadoop cluster.
In last blog post, a hadoop distribution is built to run a YARN job.
$ bin/hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar \
The Old MapReduce
The Hadoop 0.x MapReduce system composed of JobTracker and TaskTrackers.
We, programmers built Apps for people to use, sometimes, we could benefit from our users, too.
Last week, a requirement came to me, that we needed to know how often some kind of actions users did in one of our products.
It's temporal, I didn't want to write a mapreduce program, instead, I figured out an hive SQL which is a little complicated (joins 3 tables and a long where clause).