Pig, Hive, Sqoop
Writing mappers and reducers by hand take a long time. Pig introduces Pig Latin, a scripting language that lets use SQL-like syntax to define your map and reduce steps
Highly extensible with user-defined functions (UDF's)
Pig is declarative language; High-level data flow scripting language
Execution Tools & Modes: Grunt shell or CLI; Local or MapReduce mode; Interactive or Batch
LOAD -> Transform -> Result : LOAD, FILTER, JOIN, GROUP, FOREACH, GENERATE GROUP, SUM, ORDER, DUMP
External Web Site / Technical Documents / Installation Steps:Distributing SQL queries with Hadoop, Translates SQL queries to MapReduce or Tez jobs on your cluster.
The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive
Pros: Uses familiar SQL syntax (HiveQL), interactive, Scalable: work with "big data" on a cluster; Easy OLAP queries; Highly optimized; Highly extensible
Cons: High latency - not appropriate for OLTP; Stores data de-normalized; SQL is limited as compared to Pig and Spark; No transactions; No record-level updates, inserts, deletes
External Web Site / Technical Documents / Installation Steps:Handle Big Data kicks off MapReduce jobs to handle importing or exporting data
Sqoop Import Data from Relational Database like MySQL, Oracle to HDFS
Sqoop Import Data from Relational Databsae like MySQL, Oracle directly into Hive
Sqoop Incremental imports - keep your relational database and Hadoop in sync; --check-column and --last-value
Sqoop Export data from Hive to Relational Database like MySQL, Oracle
External Web Site / Technical Documents / Installation Steps: