Saturday, May 2, 2015

Introduction to Sqoop

Apache Sqoop is the one of the component built on top of HDFS and is ment for interactive with target RDBMS i.e Importing the data from RDBMS to HDFS or  Exporting the data from HDFS to RDBMS.



Below are the key observation w.r.to Sqoop : -
  1. Either Sqoop import/export with respect to HDFS there is no intervantion of LFS .
  2. To interact From Hadoop to RDBMS using Sqoop , the target RDBMS should be Java Compatible (Support JDBC Driver).
  3. IF you are same database from hadoop the corresponding RDBMS the connector should be a part of Sqoop install lib directory.

    Example:-if the database is mysql then user/lib/sqoop/lib/mysql/Java-connector .java

    NOTE : - We need to install on hadoop boxes it is not bundle in default installation with hadoop.
Sqoop Scope :



To login to mysql 

mysql -u root -p
Enter Password :

SQOOP - WEEK8-FILE
===================
sqoop import --connect jdbc:mysql://localhost/week8 
--table emp -m 1;

IMPORTING THE DATA IN A SPECIFIED DIR
-------------------------------------
sqoop import --connect jdbc:mysql://localhost/week8 
--table emp -m 1 --target-dir /ImportDir8;

IMPORTING THE DATA IN A  SPECIFIED FORMAT
---------------------------------------
sqoop import --connect jdbc:mysql://localhost/week8 
--table emp -m 1 --target-dir /ImportDir8FOR 
--fields-terminated-by '|';

IMPORTING THE DATA FROM SELECTED COLUMNS
----------------------------------------
sqoop import --connect jdbc:mysql://localhost/week8 
--table emp -m 1 --target-dir /ImportDir8Col 
--columns 'empid , ename';

IMPORTING THE DATA BASED ON CRITERIA
--------------------------------------
sqoop import --connect jdbc:mysql://localhost/week8 
--table emp -m 1 --target-dir /ImportDir8Con --columns
 'empid , ename' --where 'esal>16000' 
--fields-terminated-by '\t';

LIST ALL THE DATABASES
----------------------
sqoop list-databases --connect jdbc:mysql://localhost;

LIST ALL TABLES OF A INTERNAL DB
----------------------------------
sqoop list-tables --connect jdbc:mysql://localhost/week8;


TO IMPORT ALL TABLES DATA TO HDFS
------------------------------

sqoop import-all-tables --connect jdbc:mysql://localhost/
Gopal_Lab -m 1;

EVAL  with "--query"
-------------------

sqoop eval --connect jdbc:mysql://localhost/week8 
--query "select * from emp limit 4";

sqoop eval --connect jdbc:mysql://localhost/week8
 --query "desc emp";

sqoop eval --connect jdbc:mysql://localhost/week8 
--query "create table evalTest(evId int primary key,
evName varchar(40),evAmt int)";


IMPORT with "--query"
---------------------

sqoop import --connect jdbc:mysql://localhost/week8
 --query "select * from emp WHERE \$CONDITIONS" -m 1 
--target-dir /Query8;

sqoop import --connect jdbc:mysql://localhost/week8
 --query "select * from emp WHERE esal>18000 and 
 \$CONDITIONS" -m 1 --target-dir /Query8New;

mysql> create table dept8(deptid int primary key,
    -> deptname varchar(50),
    -> deptloc varchar(60),
    -> empid int,
    -> FOREIGN KEY (empid) REFERENCES emp(empid));

INNTER JOIN USING SQOOP
------------------------
sqoop import --connect jdbc:mysql://localhost/week8
 --query "select e.empid,ename,esal,deptid,deptname,deptloc 
from emp e JOIN dept8 d ON (e.empid = d.empid) and
  \$CONDITIONS" -m 1 --target-dir /InnerJOIN8;

LEFT OUTER JOIN USING SQOOP
---------------------------
sqoop import --connect jdbc:mysql://localhost/week8
 --query "select e.empid,ename,esal,deptid,deptname,deptloc 
from emp e LEFT OUTER JOIN dept8 d ON (e.empid = d.empid) and
  \$CONDITIONS" -m 1 --target-dir /LEFTJOIN8;


RIGHT OUTER JOIN USING SQOOP
-----------------------------

sqoop import --connect jdbc:mysql://localhost/week8
 --query "select e.empid,ename,esal,deptid,deptname,deptloc 
from emp e RIGHT OUTER JOIN dept8 d ON (e.empid = d.empid)
and  \$CONDITIONS" -m 1 --target-dir /RIGHTJOIN8;

FULL OUTER JOIN (DIRECT CMD WONT WORK HERE..WE NEED
 TO USE UNION)
-------------------------------------------------------------
sqoop import --connect jdbc:mysql://localhost/week8
 --query "select e.empid,ename,esal,deptid,deptname,deptloc 
from emp e LEFT OUTER JOIN dept8 d ON (e.empid = d.empid) 
UNION select e.empid,ename,esal,deptid,deptname,deptloc from 
emp e RIGHT OUTER JOIN dept8 d ON (e.empid = d.empid) and 
 \$CONDITIONS" -m 1 --target-dir /FULL_JOIN8;

IMPORTING THE DATA FROM RDBMS IN A SEQUENCEFILE FORMAT
------------------------------------------------------
sqoop import --connect jdbc:mysql://localhost/week8 
 --table dept8 --as-sequencefile -m 1 --target-dir
 /SEQFILE8;

IMPORTING THE DATA FROM RDBMS IN A AVRO DATA FILE FORMAT
------------------------------------------------------

sqoop import --connect jdbc:mysql://localhost/week8 
 --table dept8 --as-avrodatafile -m 1 --target-dir 
/AVROFILE8;

TO ACHIEVE THE INCREMENTAL LOAD THRU SQOOP
------------------------------------------
sqoop import --connect jdbc:mysql://localhost/week8 
--table emp -m 1 --where 'empid>300' --target-dir
 /BaseDir8 --append;


SQOOP JOBS
----------

sqoop job --create GopalJOB8 -- import --connect
 jdbc:mysql://localhost/week8 --table emp -m 1 
--target-dir /BaseDir8 --append;

sqoop job --list

sqoop job --show <<jobId>>

sqoop job --exec <<jobId>>

sqoop job --delete <<jobId>>

CODEGEN---> To see the internal generated MR code
--------------------------------------------------
sqoop codegen --connect jdbc:mysql://localhost/week8 
--table dept8;

SQOOP - EXPORT
---------------
 sqoop export --connect jdbc:mysql://localhost/week8
 --table exporttab --export-dir /BaseDir8/part-r-00000
 --fields-terminated-by '|';

sqoop export --connect jdbc:mysql://localhost/week8
 --table exporttab --export-dir /BaseDir8/part-r-00000;


No comments:

Post a Comment