Wednesday, May 13, 2015

Hive Performance tuning

1)      Where Time  = int
When the time field within the where clause is specified query parser  will automatically detect partition to be processed.

Please not it work with integer type not with float.
Syntax
[GOOD]: SELECT field1, field2, field3 FROM tbl WHERE time > 1349393020
[GOOD]: SELECT field1, field2, field3 FROM tbl WHERE time > 1349393020 + 3600
[GOOD]: SELECT field1, field2, field3 FROM tbl WHERE time > 13493930203600

2)      Don not Use Distinct with Count in hive basically it do process this kind of query Only one reducer is used.
Syntax :  select count(Distinct Field1) from tablename;
Go for this
Select Count(1)

From ( Select Distinct Field1 from tablename ) t;

No comments:

Post a Comment