Big Data: Hadoop Eco Systems

Non-Relational Data Store with Hadoop (NoSQL)

Apache HBase, MongoDB, Apache Cassandra

CAP Theorem : HBase is part of Consistency and Partition Tolerance

Non-relational, scalable database built on HDFS; Based on Google's BigTable

CRUD -> Create, Read, Update, Delete, There is no query language, only CRUD API's

HBASE Architecture -> Auto-sharding along with HMaster, ZooKeeper, Region Server(s), HDFS

External Web Site / Technical Documents / Installation Steps:

CAP Theorem : MongoDB is part of Consistency and Partition Tolerance

Managing HuMONGOus data; Document-based data model; No real schema is enforced; No single "key" as in other databases

Terminology : Databases, Collections, Documents

Replication Sets : Single Master, Maintains backup copies of database instance

Sharding : Ranges of some indexed value you specify are assigned to different replica sets

External Web Site / Technical Documents / Installation Steps:

CAP Theorem : Cassandra is part of Availability and Partition Tolerance

A distributed database with no single point of failure

There is no master node at all - every node runs exactly the same software and performs the same functions; Data model is similar to BigTable/HBase

Non-relational, but has a limited CQL query language as its interface; Fast access to rows of information

Replicate Cassandra to another ring that is used for analytics and Spark intergration

Cassandra and Spark : DataStax offers a Spark-Cassandra connector; Allows to read and write Cassandra tables as DataFrames

External Web Site / Technical Documents / Installation Steps: