01 8 / 2011

I’ve been pretty excited about Google’s LevelDB, not to mention there are some really old tanks already in the battle field like BDB, Tokyo Cabinet (Kyoto Cabinet as new one), HamsterDB etc. Fortunately I’ve already worked with Kyoto Cabinet and when I looked at the benchmarks I was totally blown away. I have a strong prejudice to name Google, but the benchmark results were way too good to believe. I also had an objection; LevelDB benchmarks compared itself only to TreeDB of Kyoto and not the HashDB (If ignoring key ordering and treating LevelDB to be true key value only). Some of you may start yelling right away that comparing Hash structure to an SSTable is not fair or something. Well by end of this post I will be comparing TreeDB with LevelDB as well and my results were different from officially posted benchmarks.

So to start off I decided to do a simple insertion, and iteration comparison (ignoring order) of simple HashDB and LevelDB without any compression or any sugar both stores come with, it will be a simple (no forced writes) benchmark of both comparing how much time each takes. I have a pretty normal machine (aka a commodity machine) with a normal HDD, and Intel Dual Core 3GB RAM. So without wasting anymore time; lets dive into code. Here is code for HashDB Kyoto and LevelDB. Pretty straight forward; I’ve carefully placed benchmarks point to only measure the “real time” consumed by both excluding additional key preparation and value preparation parts. The idea is to insert and iterate 100,000 key/value pairs. Running the benchmarks yield (Please note the first line indicates the time to insert 100,000 KV pairs, and second line indicates the time it took to iterate all those KV pairs, every time db was created from scratch i.e. previous copy was deleted. I took total 3 runs of each):

Kyoto Cabinet 100,000 entries:

  • Time consumed was 0.328945
    Time consumed was 0.207315
  • Time consumed was 0.286653
    Time consumed was 0.210296
  • Time consumed was 0.284954
    Time consumed was 0.208595

LevelDB 100,000 entries:

  • Time consumed was 0.44742
    Time consumed was 0.171842
  • Time consumed was 0.449287
    Time consumed was 0.169108
  • Time consumed was 0.444432
    Time consumed was 0.169381

Iterated reads are pretty fast in LevelDB, but writes are almost twice as slow as Kyoto Cabinet. I didn’t stop here because I’ve not hit the weak spot of LevelDB yet (yes larger values). So in my next test I fixed the length of value buffer to 512 bytes instead of the small string length, results took the expected form:

Kyoto Cabinet HashDB 512 bytes 100,000 entries:

  • Time consumed was 0.549354
    Time consumed was 0.249182
  • Time consumed was 0.451567
    Time consumed was 0.236295
  • Time consumed was 0.568072
    Time consumed was 0.240299

LevelDB 512 bytes 100,000 entries:

  • Time consumed was 5.09164
    Time consumed was 0.302542
  • Time consumed was 5.16017
    Time consumed was 0.301258
  • Time consumed was 5.32948
    Time consumed was 0.300175

At this point I became suspicious about the TreeDB benchmarks by LevelDB being true on my machine; and guess what, I was right! Here are benchmarks of TreeDB of Kyoto Cabinet:

TreeDB with Fixed 512 byte value length:

  • Time consumed was 0.393798
    Time consumed was 0.445125
  • Time consumed was 0.389259
    Time consumed was 0.439923
  • Time consumed was 0.392562
    Time consumed was 0.438211

TreeDB with small length values (same as 1st HashDB benchmarks above):

  • Time consumed was 0.39401
    Time consumed was 0.322735
  • Time consumed was 0.445422
    Time consumed was 0.349465
  • Time consumed was 0.287259
    Time consumed was 0.258459

Finally I took the total entries to 1,000,000 and ran a benchmark again with TreeDB, and LevelDB; results are as following:

LevelDB 1,000,000 entries (small values to give it an advantage):

  • Time consumed was 5.9836
    Time consumed was 1.91519
  • Time consumed was 6.03402
    Time consumed was 1.90402
  • Time consumed was 6.01079
    Time consumed was 1.88251

TreeDB 1,000,000 entries (small values):

  • Time consumed was 3.9333
    Time consumed was 2.56793
  • Time consumed was 3.87633
    Time consumed was 2.56836
  • Time consumed was 3.93033
    Time consumed was 2.57917

Surprised? Same here! Now the question is am I missing something? I am planning to put up some more benchmarks with some real life scenarios and values (may be a tweet sized stuff). Until then I would love to have feedback, and know if I’ve done any mistake while benchmarking.