11 4 / 2011

Recently I posted some benchmarks on how can you tame MySQL to get the Key-Value behavior, (You can do almost same to get that column family behavior its all about your imagination *grinnn*) lets bullet down exactly what I meant when I used MySQL with HandlerSocket:

  • First and foremost whatever you write or read is on disk (persistent)! Its not living in volatile memory (until you really make your table in-memory).
  • Concurrency; MySQL speaks for itself and I don’t think we need any discussion on this.
  • MySQL was used because it is free, plus I found Handler-Socket which saved me time.
  • Not being antiquated I use NOSQL technologies (I’ve used Cassandra, Redis, MongoDB, TokyoTyrant, CouchDB, Redis and HBase nothing to brag about; but for proof of fact that am not technology allergic or afraid of something).
  • A relational database won’t be relational until you really define relationships yourself (and enforce them)! So don’t blame them for slowing down an insert just because you enforced a refrential integrity constraint, or unintentionally did a join. I totally hate this part, because people usually take so much pain in de-normalizing data for NOSQL databases, but when it comes to a SQL database they feel offended by denormalized structure.
  • We get a stable storage engine! I am mentioning it again because I don’t believe in stability of NOSQL storage systems, they are pretty young when compared to these RDBMS reptiles.
  • There is no deterministic way for guaranteeing the number of disk seeks or reads until you go for really raw disk level, and believe it or not some databases support that (MySQL in my case).

Given all this I did some benchmarks and as expected the NOSQL community was hurt and this is what one of them thinks:

  1. with a similar setup, a NoSQL solution like Redis might give you even more operations per second
  2. key-value only access might not be enough. Many NoSQL solutions are offering at least MapReduce support.

— Quoted from mynosql blog

And I think answers to what they say are pretty simple. Redis is not something to be compared to MySQL; why? Because, its not truly persistent. You have options to either do Append Only File fsync which can be timer based (not reliable) or on every write (try this option and your system will be dead doing writes all the time!), or save based on time interval in seconds (not reliable as quoted by Redis themselves), or the manual save command (a bomb in your head if you use save button frequently). In short Redis is truly memcached on steroids. Comparing MySQL to Redis is like comparing two different genres; where choosing one is matter of your requirements.

To answer MapReduce part; let me be very clear MapReduce was made by Google to meet its own needs, which is no where near the general business needs. Plus MapReduce is used as substitute to SQL; and thats how powerful SQL is ( Yep they have articles on it ); its like a runtime compiler + storage engine. So its not fair to compare a stripped down version of a (at times pre-compiled) routine to such a flexible thing (SQL).

Keeping things simple, here is what I want to say: “Keep NOSQL as Not Only SQL, rather than NoSQL due to some buzz word or marketing cliche”. I’ve not been paid by SUN to market MySQL, but I completely hate it when people make useless assumptions about the speeds, and put so much effort designing so much denormalized NOSQL structures which could have been achieved in an SQL database (without scattering data, avoiding additional headache different API, updates, bugs and maintainability issues).