Non-Relational Data Store(NoSQL)

Big Data: Hadoop Eco Systems

Non-Relational Data Store with Hadoop (NoSQL)

Apache HBase, MongoDB, Apache Cassandra

Apache HBase

CAP Theorem : HBase is part of Consistency and Partition Tolerance

Non-relational, scalable database built on HDFS; Based on Google's BigTable

CRUD -> Create, Read, Update, Delete, There is no query language, only CRUD API's

HBASE Architecture -> Auto-sharding along with HMaster, ZooKeeper, Region Server(s), HDFS

External Web Site / Technical Documents / Installation Steps:
  • Apache HBase
  • Technical Documents/Presentation
  • Installation Steps
  • MongoDB

    CAP Theorem : MongoDB is part of Consistency and Partition Tolerance

    Managing HuMONGOus data; Document-based data model; No real schema is enforced; No single "key" as in other databases

    Terminology : Databases, Collections, Documents

    Replication Sets : Single Master, Maintains backup copies of database instance

    Sharding : Ranges of some indexed value you specify are assigned to different replica sets

    External Web Site / Technical Documents / Installation Steps:
  • MongoDB
  • Technical Documents/Presentation
  • Installation Steps
  • Apache Cassandra

    CAP Theorem : Cassandra is part of Availability and Partition Tolerance

    A distributed database with no single point of failure

    There is no master node at all - every node runs exactly the same software and performs the same functions; Data model is similar to BigTable/HBase

    Non-relational, but has a limited CQL query language as its interface; Fast access to rows of information

    Replicate Cassandra to another ring that is used for analytics and Spark intergration

    Cassandra and Spark : DataStax offers a Spark-Cassandra connector; Allows to read and write Cassandra tables as DataFrames

    External Web Site / Technical Documents / Installation Steps:
  • Apache Cassandra
  • Technical Documents/Presentation
  • Installation Steps