0%

最近在看 Stonebraker“Readings in Database Systems”, 发觉开拓了很多思路。

这么多年自己一直在从事大数据方面的工作,但除了翻过数据挖掘算法和分布式系统设计方面的论文外,完全没想过去翻翻数据库相关的论文看。现在想想,其实大数据和数据库两者很多需求和场景是一致的,要解决的问题,没准学术界很多年前就已经有方案了。

Read more »

We have a legacy system, which is a web service, receives HTTP POST from clients, parses the data, then stores them in a file.

The function of the system is simple, and people already done functional and performance test, it's stable. As time drifted away, the system was copy and paste to some projects by only changing the data parsing logic.

Read more »

In last blog post, a hadoop distribution is built to run a YARN job.

1
2
3
4
$ bin/hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar \
org.apache.hadoop.yarn.applications.distributedshell.Client -jar \
share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar \
-shell_command 'date' -shell_args "-u" -num_containers 2
Read more »

Last week, a requirement came to me, that we needed to know how often some kind of actions users did in one of our products.

It's temporal, I didn't want to write a mapreduce program, instead, I figured out an hive SQL which is a little complicated (joins 3 tables and a long where clause).

Read more »