Big Data: Hadoop Eco Systems

Relational Data Store with Hadoop

Pig, Hive, Sqoop

Apache Pig

Writing mappers and reducers by hand take a long time. Pig introduces Pig Latin, a scripting language that lets use SQL-like syntax to define your map and reduce steps

Highly extensible with user-defined functions (UDF's)

Pig is declarative language; High-level data flow scripting language

Execution Tools & Modes: Grunt shell or CLI; Local or MapReduce mode; Interactive or Batch

LOAD -> Transform -> Result : LOAD, FILTER, JOIN, GROUP, FOREACH, GENERATE GROUP, SUM, ORDER, DUMP

External Web Site / Technical Documents / Installation Steps:

Apache Pig

Technical Documents/Presentation

Installation Steps

Apache Hive

Distributing SQL queries with Hadoop, Translates SQL queries to MapReduce or Tez jobs on your cluster.

The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive

Pros: Uses familiar SQL syntax (HiveQL), interactive, Scalable: work with "big data" on a cluster; Easy OLAP queries; Highly optimized; Highly extensible

Cons: High latency - not appropriate for OLTP; Stores data de-normalized; SQL is limited as compared to Pig and Spark; No transactions; No record-level updates, inserts, deletes

External Web Site / Technical Documents / Installation Steps:

Apache Hive

Technical Documents/Presentation

Installation Steps

Apache Sqoop

Handle Big Data kicks off MapReduce jobs to handle importing or exporting data

Sqoop Import Data from Relational Database like MySQL, Oracle to HDFS

Sqoop Import Data from Relational Databsae like MySQL, Oracle directly into Hive

Sqoop Incremental imports - keep your relational database and Hadoop in sync; --check-column and --last-value

Sqoop Export data from Hive to Relational Database like MySQL, Oracle

External Web Site / Technical Documents / Installation Steps:

Apache Sqoop

Technical Documents/Presentation

Installation Steps