Monday, July 13, 2015

Hadoop SDK, HIVE, Mapreduce and .NET Developers

Hadoop SDK and Tutorials for Microsoft .NET Developers



How to Communicate to Hadoop via Hive using .NET/C#






How to Create MapReduce Jobs for Hadoop Using C#


BIG DATA for .NET Devs: HDInsight, Writing Hadoop Map Reduce Jobs In C# And Querying Results Back Using LINQ

http://www.amazedsaint.com/2013/03/taming-big-data-with-c-using-hadoop-on.html

Thursday, June 25, 2015

What type of MongoDB deployment would you like to build?

Three ways we can deploy mongodb

Standalone Instance 

 A standalone MongoDB provides a single copy of your data and is appropriate for        projects where data loss is not a concern.

Replica Set

replica set is a group of MongoDB instances that host the same data set. A replica set provides both redundancy and high availability.

Sharded Cluster

Sharding is a method for storing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.








Saturday, June 20, 2015

HIVE 0.14 Update , delete, insert is possible

Attempt to do update or delete using transaction manager that does not support these operations in hive-0.14.0


CRUD operations are supported in Hive from 0.14 onwards.
See Wiki 

Hive supports data warehouse software facility,which facilitates querying and managing large datasets residing in distributed storage. In data warehouse there are situation where we need to update, delete etc transactions.In hive later versions UPDATE was not supported,but there were workarounds to do update a transaction

1. Update Statement In Hive For Small Tables 
2. Update Statement In Hive For Large Tables using INSERT


Lets see how to do INSERT,UPDATE,DELETE in newer version of hive. 

Create a table "test"
CREATE EXTERNAL TABLE 
    test (EmployeeID Int,FirstName String,Designation  
        String,Salary Int,Department String) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY  "," 
    LOCATION '/user/hdfs/Hive';
We will try to update the salary of employee id 19 from 45,000 to 50,000.
 hive> UPDATE test 
           SET salary = 50000 
           WHERE employeeid = 19;

 FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction m anager that does not support these operations.

While applying above query it shows a semantic Exception.In order to allow update and delete we need to add additional settings in hive-site.xml and create table with ACID output format support.

To achieve the same follow below steps:

1. New Configuration Parameters for Transactions
 hive.support.concurrency – true
 hive.enforce.bucketing – true
 hive.exec.dynamic.partition.mode – nonstrict
 hive.txn.manager –org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
 hive.compactor.initiator.on – true
 hive.compactor.worker.threads – 1
You can set these configuration in hive-site.xml (after setting restart Hive ) for ever or via terminal.
Dont Forget to restart Hive once the above settings are applied, else you will get the same error again.
2. Below query creates HiveTest table with ACID support
(To do Update,delete or Insert we need to create a table that support ACID properties)
 create table HiveTest 
   (EmployeeID Int,FirstName String,Designation String,
     Salary Int,Department String) 
   clustered by (department) into 3 buckets 
   stored as orc TBLPROPERTIES ('transactional'='true') ;
3. Load data into HiveTest from a staging table,which contains the original data.
 from stagingtbl 
   insert into table HiveTest 
   select employeeid,firstname,designation,salary,department;

4. UPDATE,DELETE and INSERT operations


1.UPDATE
 update HiveTest 
    set salary = 50000 
    where employeeid = 19; 

SYNOPSIS

  1. The referenced column must be a column of the table being updated.
  2. The value assigned must be an expression that Hive supports in the select clause.  Thus arithmetic operators, UDFs, casts, literals, etc. are supported.  Subqueries are not supported.
  3. Only rows that match the WHERE clause will be updated.
  4. Partitioning columns cannot be updated.
  5. Bucketing columns cannot be updated.
  6. In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.


2. INSERT
 insert into table HiveTest 
     values(21,'Hive','Hive',0,'B');

SYNOPSIS

  1. Each row listed in the VALUES clause is inserted into table tablename.
  2. Values must be provided for every column in the table.  The standard SQL syntax that allows the user to insert values into only some columns is not yet supported.  To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.
  3. Dynamic partitioning is supported in the same way as for INSERT...SELECT.
  4. If the table being inserted into supports ACID and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion.


3. DELETE
 delete from HiveTest
     where employeeid=19;

SYNOPSIS
  1. Only rows that match the WHERE clause will be deleted.
  2. In Hive 0.14, upon successful completion of this operation the changes will be auto-committed.

Tuesday, June 9, 2015

MySQL: Multiple Inserts for a single column

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

once the bracket close it will take the values and look for another value in next bracket.

Incorrect table definition; there can be only one auto colum

Create table Tablename (id int not null AUTO_incremment ..........
.....................
.............................
);

IF you have not defined the primary key in above syntax u will face this kind of error.

Monday, June 8, 2015

Data introduction

Data

New York Stock exchange - 4-5 terabyte data per day
Facebook - 240 billion photos, 7 petabyte data per month
Internet archive stores 18.5 data.
Ancestory.com stores 10 petabyte data.

Good news is that big data is here and problem


  1. How to store
  2. how to anylyze this much of huge data.