Not so hot no NoSQL
A few years ago a lot of software people got very excited about so called NoSQL databases. It started I think with the papers introducing the Google File System(GFS), BigTable and their use of MapReduce(which like PageRank turns out to be based on decades old principles, just applied in a new way). Since then many different distributed databases have come along including Hadoop, Cassandra, CouchDB, MongoDB, Voldemort, and many others. Typically the NoSQL dbs trade off ACID compliance, 100% availability, and/or a SQL interface in return for being distributed and much higher performance or lower cost. I bought into it and thought it was a great idea. Now I’ve sort of changed my mind.
NoSQL is not a bad idea, but it fills a pretty small niche. A much smaller niche than I thought before. If you are considering a NoSQL implementation, you should probably satisfy several of the following conditions:
- willing to develop in house expertise in NoSQL storage, monitoring, backups, analysis, tuning
- large dataset
- A lot of unstructured data
- no schema design
Willing to develop in house expertise:
This is the big one in my mind. I don’t know many people who love SQL, but it’s a very mature, very well understood standard. You can find expertise at all levels, whether you need a tutorial to write your first SELECT, or you need to a DBA and storage expert to figure out why your query on a multiterabyte Oracle DB takes 5ms instead of 1ms.
Large Dataset:
MySQL can handle millions of rows easily and hundreds of gigabytes per table. Oracle can scale even better, and that is without using any sharding or partitioning. Of course Oracle is more expensive, but you get piece of mind and support. For NoSQL to make sense, you need to have a dataset of billions of rows, or multiple terabytes of data. NoSQL dbs are mostly distributed object dbs with carefully designed to partition data among nodes and handle node failures. So if you’re running on a single host, you’re almost certainly better off going with a traditional SQL db.
Unstructured data:
One common reason to use a NoSQL db is to avoid storing text or xml BLOBs in a database. This is an excellent use for NoSQL. One of the great values of a SQL database is that you can get good average response times because your data is structured. A SQL db querying an indexed integer field is a lot more predictable than trying to store and query both integers and 10MB text objects together. SQL databases are traditionally not good at handling BLOBs. Also because people usually require highest availability on SQL databases, people think that putting large unstructured data into a NoSQL db with cheap distributed hardware is a money saver. Of course even with NoSQL, you’ll still need a monitoring, administration, backup and high availability plan of some sort, but in theory it shouldn’t be as expensive as the ”this SQL db must have 100% uptime” plan.
no schema design:
Many NoSQL databases are simple key-value structure or key-object. This is fine for many types of applications. But building extra layers on top of this to handle object relationships or mappings is time consuming. Some NoSQL databases have columns or fields and definable indices, allowing fast lookups. They tout flexible schema as a great design feature. OK, but SQL dbs are even better at that.
I still think NoSQL dbs are cool and useful, just not for everything

Arek Said,
June 16, 2010 @ 5:40 pm
I believe you missed one very important reason: data partitioning.
You may need it because of size or performance – and you will choose Cassandra, Voldemort or other implementation of Amazon’s Dynamo.
But you may also like to have all of your data in separate servers in different geographical locations, or you may like your data to be on servers and on mobiles (etc.) at a same time. When you need “master to master to master” replication, you will probably choose CouchDB.
The first use case is definitely a niche.
The second probably still is, but maybe it will grow.
With no schema I don’t think NOSQL DB will slow you when you need no fixed schema for your data.
You may also like different type of schemas like with Graph DB.
So maybe not best for all cases, but I don’t think they are only for “niche” use cases, and definitely there is more then one “niche”.
Hubert Chen Said,
June 16, 2010 @ 6:09 pm
That’s a good point. I allude to it when I mention data size, but I should’ve talked more about partitioning. Many NoSQL implementations seems to be driven by an existing SQL db running into some kind of I/O bottleneck. Then they move from SQL to NoSQL and use data partitioning to spread out I/O load among several servers and get back to good query performance. Part of the problem is that for years, I/O has not really improved much compared to disk capacities which have grown exponentially. Disk systems are having to satisfying an ever increasing amount of I/O with the same number of spindles because people are buying disks for capacity instead of for I/O. A different solution is to just increase the number of I/Os by moving the DB to an SSD which has much better I/O capacity, though is a bit expensive per GB. Still, that might be simpler than a large application rewrite for NoSQL. Though that doesn’t solve the important geographic diversity issues that you mention.
Tweets that mention Not so hot no NoSQL -- Topsy.com Said,
June 16, 2010 @ 6:13 pm
[...] This post was mentioned on Twitter by LeGrandBI, Nicolas Martelliere. Nicolas Martelliere said: RT @LeGrandBI: Et si NoSQL s'avérait finalement être un pétard mouillé… http://bit.ly/aNGWW2 [...]