For the past few months I have read a lot of articles on NoSQL vs. RDBMS. It’s almost like a religious war between MAC & PC users. For almost half of a century, RDBMS (the relational database) has been the dominant model for database management. But, today, non-relational, “NoSQL” databases are gaining mindshare as an alternative model for database management.
So the question is what is NoSQL and how is it different and/or better from the traditional RDBMS?
According to the definition from Wikipedia, “NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage.” To keep things simple, NoSQL stores data in a non-relational fashion.
RDBMS requires that data is normalized so that it can provide quality results and prevent duplicates and orphan records. Normalizing the data requires creation of more tables, which require table joins, and thus requiring more indexes and keys. The problem becomes more apparent with highly diverse datasets with lots of unstable indexes on them probably a hundred or so tables, and each table having varying indexes. I/O becomes chaotic when indexes of different tables are stored on different parts of HDD or SSD and you have concurrent reads/writes. In case of Cloud, the storage represented to the user may be different disks or different kinds of store; Cloud storage comes with abstraction; as databases start to grow into the terabytes or even petabytes, performance starts to fall off significantly.
NoSQL uses multi dimensional data structure and groups relevant data closely to reduce the I/O time required to return the query results. NoSQL also distributes the work across multiple locations (often deployed on a grid) so that many threads are working independently and simultaneously. NoSQL uses the concept of maps which groups multiple index values allowing for a single map to handle a dynamic set of queries based on many attributes. NoSQL allows for versioning of records. By time-stamping changes, new records are added to the database without the overhead that updates and deletes have in a RDBMS.
It should be should be pointed out that the idea of RDBMS slower than NoSQL is not always true. Let’s take a case of Analytics and business intelligence; Businesses mine information in corporate databases to improve their efficiency and competitiveness, and business intelligence (BI) is a key IT issue for all SMBs to large companies. NoSQL databases offer few facilities for ad-hoc query and analysis. Even a simple query requires significant programming expertise, and commonly used BI tools do not provide connectivity to NoSQL. Some respite is provided by the surfacing of solutions such as HIVE or PIG, which can provide easier access to data held in Hadoop clusters and perhaps eventually, other NoSQL databases.
For decades Database administrators have relied on scale up — buying bigger servers as database load increases — rather than scale out — distributing the database across multiple hosts as load increases. However, as transaction rates and availability requirements increase, and as databases move into the cloud or onto virtualized environments, the economic advantages of scaling out on commodity hardware become irresistible. NoSQL databases are designed to expand transparently to take advantage of new nodes, and they’re usually designed with low-cost commodity hardware in mind.
To sum up NoSQL databases are becoming an important part of the database environment, and when used appropriately, can provide significant performance benefits.
Reference:
http://stu.mp/2010/03/nosql-vs-rdbms-let-the-flames-begin.html
http://www.techrepublic.com/blog/10things/10-things-you-should-know-about-nosql-databases/1772
http://news.ycombinator.com/item?id=1221598
http://arifn.web.id/blog/2010/05/05/nosql-the-end-of-rdbms.html
*If you find something is misleading or not correct then please throw some light on it.