Today I will be doing the long awaited initial benchmarks of Oracle’s NoSQL. Before I jump in I would like to mention that I had a previous post on Oracle’s NoSQL that examines the philosophy and design of Oracle NoSQL store. It’s been few days I’ve been playing around with system. I found it pretty simple due to its major and minor key system; some great multi-get, iterator constructs for read, basic delete, and put system with versions. I did find some short comings, and we will see them in details at the end of this post. So lets get started.
For sake of benchmarking I am going to make things really simple. I will be benchmarking the insert and read speeds of a single node (1 machine no replication, no distribution) and look how a single node performs, based on that we can make rough estimates of performance gains by adding more machines.
Machine used for benchmarks is a pretty normal laptop machine with Dual Core 2.1GHz processor, 3 GB RAM, and commodity HDD. As an example I implemented a twitter like user stream of 100,000 tweets for 100 users randomly (that make approx 1000 tweets per user) each tweet with size of a little more than 100 bytes. So what we will be benchmarking is a speed test for inserting 100,000 tweets (approx 100+bytes) for inserting and reading them all. They should be pretty good experiments (close to a real world scenario) to give us an idea of what Oracle NoSQL can do.
Here is the piece of code that inserts and reads the 100,000 million entries and benchmarks the total time consumed. Compiling and running this I get output as following:
Write Time consumed 143156
Iteration Total time taken 223
Please note the time consumed in above benchmark is number of milliseconds. It turns out that inserting 100,000 entries in random order (since user ids generated are random) take 143156 ms and iterating over each entry of each user take 223 ms. I am pretty satisfied with read speed, and for write speed I found and average time of 1.4 ms (which is slower than reads but almost 700 inserts per second). Its important to note that each insert here is disk synchronized (with durability of Durability.SyncPolicy.SYNC on master, and since we have only 1 node means we are doing the most disk write with no buffering). If I lower the durability value (Durability.COMMIT_WRITE_NO_SYNC write with buffering), the average time drops to 0.5 ms; which is almost double the performance of previous version (or Durability.COMMIT_SYNC). It’s worth noting that you can also customize durability per transaction, which is great for letting a programmer choose what he wants.
I didn’t stop here and continued to implement a example twitter class, which proves the simplicity of what such this simple structure can do. I think the core power lies in not being a monolithic (JSON like) tree like object, and allowing us to pull data on partial paths as well as from full major key path. The class is used in this example code to create a user, authenticate, tweet, and query time-line of user.
Somebody may wonder how it compares with other data stores (document free, column based, key value). I can definitely achieve same effect by other key value stores (some what close as well), but I must say this style was much breeze and easy to imagine (just look at the code for reading tweets and you will know what I am saying), Oracle must rename store type from Key Value store to a new title. Because it’s pretty clear that its not a key value store; it overlaps between the document and key value style (Again I achieve this same effect in key sorted KV stores like LevelDB or TokyoCabinet).
In closing I must also give out some negatives I found, plus the some extras. Currently the number of partitions on Oracle NoSQL are fixed! What does this mean? It implies that at storage can’t be scaled horizontally yet since they can’t rebalance; once partitions have been made they remain fixed. Although Oracle in it’s documentation mentions about the next version having the re-balancing feature. Second (may be most for people) hurting part is dependence on Java platform. Now this will hurt lot of Ruby, and Python etc. people but again, as I said in my previous post, with help of a Java programmer you can write a REST API for yourself (I am planning a Protobuff server to eat this plague). Third, I found administration of system to be a little tricky (could not be understood without reading complete administration docs), but this is no hurdle for a programmers since it provides launch and go script to run the server. Right now I am planning to write a Scala wrapper (syntactic awesomeness) for Oracle NoSQL, so once I am done I will put it on GitHub. You folks in the mean while have few more calories of this NoSQL deliciousness.
Design by Simon Fletcher. Powered by Tumblr.
© Copyright 2010