Saturday, June 20, 2015

HIVE 0.14 Update , delete, insert is possible

Attempt to do update or delete using transaction manager that does not support these operations in hive-0.14.0


CRUD operations are supported in Hive from 0.14 onwards.
See Wiki 

Hive supports data warehouse software facility,which facilitates querying and managing large datasets residing in distributed storage. In data warehouse there are situation where we need to update, delete etc transactions.In hive later versions UPDATE was not supported,but there were workarounds to do update a transaction

1. Update Statement In Hive For Small Tables 
2. Update Statement In Hive For Large Tables using INSERT


Lets see how to do INSERT,UPDATE,DELETE in newer version of hive. 

Create a table "test"
CREATE EXTERNAL TABLE 
    test (EmployeeID Int,FirstName String,Designation  
        String,Salary Int,Department String) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  "," 
    LOCATION '/user/hdfs/Hive';
We will try to update the salary of employee id 19 from 45,000 to 50,000.
 hive> UPDATE test 
           SET salary = 50000 
           WHERE employeeid = 19;

 FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction m anager that does not support these operations.

While applying above query it shows a semantic Exception.In order to allow update and delete we need to add additional settings in hive-site.xml and create table with ACID output format support.

To achieve the same follow below steps:

1. New Configuration Parameters for Transactions
 hive.support.concurrency – true
 hive.enforce.bucketing – true
 hive.exec.dynamic.partition.mode – nonstrict
 hive.txn.manager –org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
 hive.compactor.initiator.on – true
 hive.compactor.worker.threads – 1
You can set these configuration in hive-site.xml (after setting restart Hive ) for ever or via terminal.
Dont Forget to restart Hive once the above settings are applied, else you will get the same error again.
2. Below query creates HiveTest table with ACID support
(To do Update,delete or Insert we need to create a table that support ACID properties)
 create table HiveTest 
   (EmployeeID Int,FirstName String,Designation String,
     Salary Int,Department String) 
   clustered by (department) into 3 buckets 
   stored as orc TBLPROPERTIES ('transactional'='true') ;
3. Load data into HiveTest from a staging table,which contains the original data.
 from stagingtbl 
   insert into table HiveTest 
   select employeeid,firstname,designation,salary,department;

4. UPDATE,DELETE and INSERT operations


1.UPDATE
 update HiveTest 
    set salary = 50000 
    where employeeid = 19; 

SYNOPSIS

  1. The referenced column must be a column of the table being updated.
  2. The value assigned must be an expression that Hive supports in the select clause.  Thus arithmetic operators, UDFs, casts, literals, etc. are supported.  Subqueries are not supported.
  3. Only rows that match the WHERE clause will be updated.
  4. Partitioning columns cannot be updated.
  5. Bucketing columns cannot be updated.
  6. In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.


2. INSERT
 insert into table HiveTest 
     values(21,'Hive','Hive',0,'B');

SYNOPSIS

  1. Each row listed in the VALUES clause is inserted into table tablename.
  2. Values must be provided for every column in the table.  The standard SQL syntax that allows the user to insert values into only some columns is not yet supported.  To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
  3. Dynamic partitioning is supported in the same way as for INSERT...SELECT.
  4. If the table being inserted into supports ACID and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion.


3. DELETE
 delete from HiveTest
     where employeeid=19;

SYNOPSIS
  1. Only rows that match the WHERE clause will be deleted.
  2. In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.