# An Arithmetic Issue in Hive SQL

Last week, a requirement came to me, that we needed to know how often some kind of actions users did in one of our products.

It's temporal, I didn't want to write a mapreduce program, instead, I figured out an hive SQL which is a little complicated (joins 3 tables and a long where clause).

It worked. I got the results, we knew how many times people did one kind of actions on their 1st day, 1st week, 2nd week and 3rd week.

But in the 4th week, I got 0!

It's weird.

Then I counted the 5th week, a dramatic smaller number compared with the 3rd week.

I could get an conclusion, that nearly all people comes to our product, at their 4th week, they left.

It's unbelievable.

I asked a colleague of mine do those statistics by python again, the results of the first 3 weeks were the same, but the 4th and 5th week, his were much bigger than mine.

There might be something wrong with my hive SQL.

Am I using too many `join`

s?

Am I mistakenly put `where`

clause out of
`join on`

condition?

...

Those questions plagued me for days, but none of them is true.

Below `where`

clause is what I used to extract the data in
the 4th week.

1 | where action.occurtime < user.persistedtime + 24*60*60*1000*21 and action.occurtime < user.persistedtime + 24*60*60*1000*28 |

I noticed that my `persistedtime`

column type is string,
maybe string to int auto conversion went wrong during comparison. But if
it went wrong, why only affected the 4th week. It's unreasonable.

I made a test, created a test table, and changed the column type to bigint, same result.

What changed in the whole SQL is the arithmetic operation,

`24*60*60*1000*7/14/21/28`

Int overflow?

No! hive is not that stupid.

The biggest int32 is `2147483647`

, and
`60*60*24*1000*28`

equals to `2419200000`

, if int
overflows, the `where`

clause will be "less than a smaller
number and greater than a bigger number", which is ZERO for any
data!

I put `2419200000`

to `where`

clause to replace
`60*60*24*1000*28`

, and, I got the right result.

**The multiplication operation converts its result to int32,
the addition operation is not affected.**

We aren't gonna stop, are we?

Dig deeper.

1 | // hive-0.10.0/src/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPMultiply.java |

From above code, we can see that hive multiplication operator (*) simply converts its result to the same type of its operands. In our case, (int, int) -> (int).

How to tell hive the number we typed in is a long type, I made a
guess, `l`

, failed, then tried `L`

, bingo!

So if you want to use multiplication operator (*) in hive SQL, better
to add `L`

to one of the operands if the result might be
bigger than the maximum value of int32.

Why the addition operator does not have that problem?

Because in our case the left operand is already bigint.

At last, should I blame Hive?

Absolutely not! Because I didn't RTFM :(

A * B

all number types

Gives the result of multiplying A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. Note that if the multiplication causing overflow, you will have to cast one of the operators to a type higher in the type hierarchy.