16 2 / 2011

Every dogma has its day — Anthony Burgess

And yes I’ve been living the NoSQL dogma! The technologies like HBase, Voldermort, CouchDB, and a huge list of others (each designed for a particular scenario) made me believe; that my favorite RDBMS MySQL has grown old and it can’t handle my data anymore. When ever I encountered the API’s of these noSQL tools, I realized they are super trivial ( there are exceptions like MongoDB ), they are in no way as “rich” as SQL itself is. Secondly the rich “structure” and the “value” is gone (document stores are not always usable in real world scenarios, most of the time we are doing the key value look up).
Long story cut short; everybody just ignores the effort these beautiful creatures (RDBMSs) put into your daily problems; take care of issues like parsing, queuing and all the dirty stuff they are doing for you. So next question should be; if I get rid of SQL from MySQL (Just a specific case I am pretty sure we can do it on other engines as well) would it be ( only My ) any faster? Seems like yes. I would be demonstrating the benchmarks in a while. But before proceeding I would like to mention clearly that these benchmark values have pretty clearly proved me that a RDBMS when stripped off from the additional over-heads performs as good as any persistent key-value store (Yep its my personal opinion and I know noSQL community is hurt by this statement). In my demo I will be replicating a presistent Key/Value store using MySQL and HandlerSocket.

Tools of trade
I spent a whole day compiling these tools on my machine but by the end of the day it was a satisfaction (You should try as well)! You will require:

  • A MySQL 5.5.9 ( Latest Stable till date ) Source Code
  • A HandlerSocket ( Latest pull from Github ) Latest Commit
  • Lovely Python 2.6.5 ( Comes on my debian )
  • Newborn Python Socket Handler ( Latest ) Homepage
  • And lots of patience for compiling MySQL+HandlerSocket


Setting the stage
Once compiled and system is ready next thing would be creating a basic schema and table for storing the values; here is my basic schema named 'foo' and table named 'kv':

CREATE TABLE `foo`.`kv` (
`key` CHAR(255) NOT NULL,
`val` TEXT NOT NULL,
PRIMARY KEY (`key`)
) ENGINE = MyISAM;


Darn simple (I’ve used key as CHAR for now, INT will be definitely faster); realistically 255 is more than enough for a key to store its value (until you prove your self a Jimmy).
For my Python code here is the complete code that I used to benchmark my results.

Machine specs
My machine is just a normal laptop with following specs:

Results
Yep time for moment of truth, I inserted a hundered thousand (100,000) entries with a varying key sizes (see the code you will get what I am saying its pattern is like key0 - key99999 ), but a value of constant string with a size of 1K ( If I was a twitter it would have been much less ). Here are the statistics (Full console dump):

  • 100,000 writes took 29.6542 seconds ( ~ 3372 writes / second )
  • 100,000 reads in order took 18.4197 seconds ( ~ 5429 reads / second )
  • 100,000 reads in random order 17.0343 seconds ( ~ 5870 reads / second ) (Socking!)


I would like to add I have used the default settings of MyISAM ( default ones with make install ), plus I repeated the experiment without truncating the tables (Nothing just happened it remain almost same every-time).

Whats the point?

MySQL is a proven technology, its engines are stable and well known, and scalable ( Yep modern forks for clouds are already rolling one example is drizzle); so I can’t find any point in shifting to a technology thats a new born, adds more calories to my code, and then brag about it just because its a buzz word! Untill there is no genuine reason to have a noSQL its just a trendy statement! I am pretty sure by tuning some more parameters, I can take these reads and writes to a higher number. Taking the same code on a server machine will off-course make these numbers even better. Despite all factors missing on a normal machine it was an impressive performance. For comparisons sake I would be making no excuses and posting benchmarks for other noSQL systems from my machine.

So it was my last day of noSQL dogma what about you?