A few years ago a lot of software people got very excited about so called NoSQL databases. It started I think with the papers introducing the Google File System(GFS), BigTable and their use of MapReduce(which like PageRank turns out to be based on decades old principles, just applied in a new way). Since then many different distributed databases have come along including Hadoop, Cassandra, CouchDB, MongoDB, Voldemort, and many others. Typically the NoSQL dbs trade off ACID compliance, 100% availability, and/or a SQL interface in return for being distributed and much higher performance or lower cost. I bought into it and thought it was a great idea. Now I’ve sort of changed my mind.
NoSQL is not a bad idea, but it fills a pretty small niche. A much smaller niche than I thought before. If you are considering a NoSQL implementation, you should probably satisfy several of the following conditions:
- willing to develop in house expertise in NoSQL storage, monitoring, backups, analysis, tuning
- large dataset
- A lot of unstructured data
- no schema design
Willing to develop in house expertise:
This is the big one in my mind. I don’t know many people who love SQL, but it’s a very mature, very well understood standard. You can find expertise at all levels, whether you need a tutorial to write your first SELECT, or you need to a DBA and storage expert to figure out why your query on a multiterabyte Oracle DB takes 5ms instead of 1ms.
MySQL can handle millions of rows easily and hundreds of gigabytes per table. Oracle can scale even better, and that is without using any sharding or partitioning. Of course Oracle is more expensive, but you get piece of mind and support. For NoSQL to make sense, you need to have a dataset of billions of rows, or multiple terabytes of data. NoSQL dbs are mostly distributed object dbs with carefully designed to partition data among nodes and handle node failures. So if you’re running on a single host, you’re almost certainly better off going with a traditional SQL db.
One common reason to use a NoSQL db is to avoid storing text or xml BLOBs in a database. This is an excellent use for NoSQL. One of the great values of a SQL database is that you can get good average response times because your data is structured. A SQL db querying an indexed integer field is a lot more predictable than trying to store and query both integers and 10MB text objects together. SQL databases are traditionally not good at handling BLOBs. Also because people usually require highest availability on SQL databases, people think that putting large unstructured data into a NoSQL db with cheap distributed hardware is a money saver. Of course even with NoSQL, you’ll still need a monitoring, administration, backup and high availability plan of some sort, but in theory it shouldn’t be as expensive as the ”this SQL db must have 100% uptime” plan.
no schema design:
Many NoSQL databases are simple key-value structure or key-object. This is fine for many types of applications. But building extra layers on top of this to handle object relationships or mappings is time consuming. Some NoSQL databases have columns or fields and definable indices, allowing fast lookups. They tout flexible schema as a great design feature. OK, but SQL dbs are even better at that.
I still think NoSQL dbs are cool and useful, just not for everything