DISTRIBUTE BY
=============
1. DISTRIBUTE BY controls how map output will be divided among Reducers.
2. In the below example DISTRIBUTE BY ensures that records of ename will go to the same reducer and then to SORT the same
data the way we want (ascending order of empid).
3. DISTRIBUTE BY works similar to GROUP BY in the sence that how it controls the reducers to receives the rows for processing.
NOTE: Hive requires DISTRIBUTE BY clause comes BEFORE the SORT BY if we are using both in a query.
-----------------------------------
hive> select * from disttab;
OK
108 Sandeep 22000
109 Sandeep 22500
110 Sandeep 23500
111 Sandeep 24340
112 Karan 45000
113 Karan 45600
101 Ravi 46000
102 Prakash 34000
103 Murali 23000
104 Madan 45555
105 Kanth 56000
106 Varma 33333
Time taken: 0.409 seconds
hive>
-------------------------------------
hive> select empid , ename , esal from disttab DISTRIBUTE BY ename SORT BY ename asc , esal asc;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201304160610_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201304160610_0008
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201304160610_0008
2013-04-16 07:05:36,346 Stage-1 map = 0%, reduce = 0%
2013-04-16 07:05:40,480 Stage-1 map = 100%, reduce = 0%
2013-04-16 07:05:49,639 Stage-1 map = 100%, reduce = 33%
2013-04-16 07:05:50,765 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304160610_0008
OK
108 Sandeep 22000
109 Sandeep 22500
110 Sandeep 23500
111 Sandeep 24340
105 Kanth 56000
112 Karan 45000
113 Karan 45600
104 Madan 45555
103 Murali 23000
102 Prakash 34000
101 Ravi 46000
106 Varma 33333
Time taken: 24.721 seconds
hive>
=============
1. DISTRIBUTE BY controls how map output will be divided among Reducers.
2. In the below example DISTRIBUTE BY ensures that records of ename will go to the same reducer and then to SORT the same
data the way we want (ascending order of empid).
3. DISTRIBUTE BY works similar to GROUP BY in the sence that how it controls the reducers to receives the rows for processing.
NOTE: Hive requires DISTRIBUTE BY clause comes BEFORE the SORT BY if we are using both in a query.
-----------------------------------
hive> select * from disttab;
OK
108 Sandeep 22000
109 Sandeep 22500
110 Sandeep 23500
111 Sandeep 24340
112 Karan 45000
113 Karan 45600
101 Ravi 46000
102 Prakash 34000
103 Murali 23000
104 Madan 45555
105 Kanth 56000
106 Varma 33333
Time taken: 0.409 seconds
hive>
-------------------------------------
hive> select empid , ename , esal from disttab DISTRIBUTE BY ename SORT BY ename asc , esal asc;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201304160610_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201304160610_0008
Kill Command = /usr/lib/hadoop/bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201304160610_0008
2013-04-16 07:05:36,346 Stage-1 map = 0%, reduce = 0%
2013-04-16 07:05:40,480 Stage-1 map = 100%, reduce = 0%
2013-04-16 07:05:49,639 Stage-1 map = 100%, reduce = 33%
2013-04-16 07:05:50,765 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201304160610_0008
OK
108 Sandeep 22000
109 Sandeep 22500
110 Sandeep 23500
111 Sandeep 24340
105 Kanth 56000
112 Karan 45000
113 Karan 45600
104 Madan 45555
103 Murali 23000
102 Prakash 34000
101 Ravi 46000
106 Varma 33333
Time taken: 24.721 seconds
hive>
No comments:
Post a Comment