CLUSTER BY
==========
1.CLUSTER BY clause is a short-hand way of expressing the DISTRIBUTE BY with SORT BY
PFB example how we have changed the way of DISTRIBUTE BY with SORT BY
---------------------------------------------------------------------
hive> select empid , ename , esal from disttab CLUSTER BY ename;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201304160610_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201304160610_0009
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201304160610_0009
2013-04-16 07:07:57,918 Stage-1 map = 0%, reduce = 0%
2013-04-16 07:08:05,362 Stage-1 map = 100%, reduce = 0%
2013-04-16 07:08:14,526 Stage-1 map = 100%, reduce = 33%
2013-04-16 07:08:15,535 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304160610_0009
OK
NULL NULL NULL
108 Gopal 22000
109 Gopal 22500
110 Gopal 23500
111 Gopal 24340
105 Kanth 56000
113 Karan 45600
112 Karan 45000
104 Madan 45555
103 Murali 23000
102 Prakash 34000
101 Ravi 46000
106 Varma 33333
Time taken: 32.426 seconds
hive>
==========
1.CLUSTER BY clause is a short-hand way of expressing the DISTRIBUTE BY with SORT BY
PFB example how we have changed the way of DISTRIBUTE BY with SORT BY
---------------------------------------------------------------------
hive> select empid , ename , esal from disttab CLUSTER BY ename;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201304160610_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201304160610_0009
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201304160610_0009
2013-04-16 07:07:57,918 Stage-1 map = 0%, reduce = 0%
2013-04-16 07:08:05,362 Stage-1 map = 100%, reduce = 0%
2013-04-16 07:08:14,526 Stage-1 map = 100%, reduce = 33%
2013-04-16 07:08:15,535 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304160610_0009
OK
NULL NULL NULL
108 Gopal 22000
109 Gopal 22500
110 Gopal 23500
111 Gopal 24340
105 Kanth 56000
113 Karan 45600
112 Karan 45000
104 Madan 45555
103 Murali 23000
102 Prakash 34000
101 Ravi 46000
106 Varma 33333
Time taken: 32.426 seconds
hive>
No comments:
Post a Comment