Friday, May 25, 2012

NoSQL Database in Big Data Era


For quite some time, the on-premises relational database has been the undisputed model for database management in most organizations. But, now options such as, cloud based and NoSQL databases are gaining popularity as an alternative model. Some of the reasons for enterprise to look at NoSQLs are: First the volumes of data that are being stored now-a-days have increased massively and managing the same by a single relational database management system are becoming reasonably impossible for the enterprises. This situation has leads the enterprise to look at various (horizontal or vertical) database scaling options. Horizontal scaling forces enterprise has to go in for high investment mode to buy powerful hardware where else vertical scaling involves usage of inexpensive commodity hardware.  As traditional relational database do not scale out easily on commodity clusters, as the NoSQL databases which are usually designed with low-cost commodity hardware in mind, the enterprises to realize the economic benefits are moving towards horizontal scaling based on NoSQL database system to meet the demand for powerful data processing system. And finally, change management is becoming a big challenge in large production relational database systems and realization that NoSQL tends to impose comparatively very less data model restrictions and along with the fact that NoSQL uses Key Value store which allows application to store virtually any structure it wants in a data element is compelling enterprise to look at alternate options to relational database system.

Even though NoSQL database seems to be promising, factors like product maturity, availability of expertise in the market and enterprise support offered by vendors are hindering the adoptions in many mainstream enterprises. It is worthwhile to look at few of the products in this segment.


FlockDB
An open source distributed, fault-tolerant graph database (data store made up of network of nodes and stores relationships between nodes). Promoted by Twitter to build its database of users and manage their relationships to one another. Simpler when compared to other powerful graph databases such as Neo4J, but has not gained major acceptance in the community as of now.
Cassandra
An open source distributed database system based on key-value store. Promoted by Facebook as a good tool for tracking large amount of data across network of computers, such as status updates in Facebook is build upon with a principle of "eventual consistency", and  not, "perfect consistency". Well accepted in the community and many companies, including Acunu and Datastax offers commercial support for Cassandra.
CouchDB
CouchDB stores documents, which is made up of a set of pairs that link key with a value.  It searches for documents with two functions to map and reduce the data. One formats the document, and the other makes a decision about what to include. Provides interfaces based on JavaScript/JSON. Enjoys rapidly increasing community support.
MongoDB
An open source database system. MongoDB stores structured data as JSON-like documents with dynamic schemas and allows direct data queries using JavaScript.
Riak
Riak is one of the most sophisticated data stores, which comes as open source and enterprise variant. Build on the principles of "Eventual consistency is no excuse for losing data", it offers most of the features found in others, and adds more control over duplication. Although the basic structure stores pairs of keys and values, the options for retrieving them and guaranteeing their consistency are quite rich.
SimpleDB
A highly available and flexible non-relational data store, promoted by Amazon. Build on the principles of "Eventual consistency”. Provides a simple web services interface to create, store multiple data sets and to query data.
HBase
An open source, distributed, column-oriented store, modeled after Google's BigTable. Written in Java and promoted by Apache, runs on top of Hadoop Distributed Filesystem – HDFS and provides BigTable-like capabilities for Hadoop. HBase is not an "eventually consistent" data store and is designed to be strongly consistent, even if that consistency causes availability problems. Provides programmatic access through Java API, Thrift and REST.