CouchDB Insert Benchmarks
Posted on 25/6/09 by Felix Geisendörfer
Hey folks,
I am currently working on replacing Amazon S3 as the key value storage service for Debuggable's new startup. The main reason for that choice is that we want the ability to license the technology for in-house usage, which means the S3 dependency has to go.
Over the past year or so I have repeatedly heard good things about CouchDB (not to mention that Damien Katz became a personal hero of mine after seeing this video). But the main reason for preferring CouchDB over all the alternative key value stores is CouchDB's simplicity. Using HTTP + JSON as the protocol and embracing complete RESTfulness makes getting started with CouchDB ridiculously easy. It also makes it very awesome as you can build your architecture with other HTTP tools such as reverse proxies, http load balancers, etc..
Anyway, having done absolutely nothing with CouchDB before, I feel I should avoid shooting myself in the foot by making poor performance / scaling assumptions that may or may not be true for our particular use cases. So I did what every new user of an open source project would
However, the whole point of these is not to find out how fast CouchDB really is - I don't care as long as it scales horizontally. No, I am mostly interested in finding out 2 things:
- a) How much slacking time do I have left before we seriously need to scale out to multiple CouchDB nodes
- b) When that day comes, what can we expect in terms of replication delay (eventual consistency) in a multi-master setup
So far I have been mostly studying question a). Since our application is mostly storage heavy, but does not necessarily have lots of read hits (see, this is the point where you should realize my tests might not help answer your questions at all - sorry), I was wondering about the disk space consumption relative to the number of documents stored in the database.
After reading a bit about the B+ tree CouchDB uses for storage, I assumed that the required disk space would grow linear with the number of documents stored. However, my initial tests indicated that the disk space / document was growing with each document I was adding. Assuming a bad setup and after discussing this a bit in #couchdb, I decided to create a more serious setup on Amazon Ec2 inserting anywhere from 0 - 1 million records.
I put all the code for setting up the environment and running the tests on GitHub, see my couchdb-benchmarks project.
Anyway you waited long enough. Time for my initial results:
doc count (before compact): 0 doc count (after compact): 0 insert time: 0 sec insert time / doc: n/a ms compact time: 1.0064 sec compact time / doc: n/a ms disk size (before compact): 79 bytes disk size (after compact): 79 bytes .couch size (before compact): 79 bytes .couch size (after compact): 79 bytes .couch size / doc (before compact): n/a bytes .couch size / doc (after compact): n/a bytes doc count (before compact): 1 doc count (after compact): 1 insert time: 0.0015 sec insert time / doc: 1.46 ms compact time: 1.0051 sec compact time / doc: 1005.14 ms disk size (before compact): 315 bytes disk size (after compact): 4179 bytes .couch size (before compact): 315 bytes .couch size (after compact): 4179 bytes .couch size / doc (before compact): 315 bytes .couch size / doc (after compact): 4179 bytes doc count (before compact): 2 doc count (after compact): 2 insert time: 0.0015 sec insert time / doc: 0.75 ms compact time: 1.0057 sec compact time / doc: 502.87 ms disk size (before compact): 503 bytes disk size (after compact): 8281 bytes .couch size (before compact): 503 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 251.5 bytes .couch size / doc (after compact): 4140.5 bytes doc count (before compact): 3 doc count (after compact): 3 insert time: 0.0017 sec insert time / doc: 0.56 ms compact time: 1.0053 sec compact time / doc: 335.1 ms disk size (before compact): 693 bytes disk size (after compact): 8281 bytes .couch size (before compact): 693 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 231 bytes .couch size / doc (after compact): 2760.33 bytes doc count (before compact): 4 doc count (after compact): 4 insert time: 0.0017 sec insert time / doc: 0.44 ms compact time: 1.0053 sec compact time / doc: 251.33 ms disk size (before compact): 883 bytes disk size (after compact): 8281 bytes .couch size (before compact): 883 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 220.75 bytes .couch size / doc (after compact): 2070.25 bytes doc count (before compact): 5 doc count (after compact): 5 insert time: 0.0018 sec insert time / doc: 0.36 ms compact time: 1.0054 sec compact time / doc: 201.08 ms disk size (before compact): 1071 bytes disk size (after compact): 8281 bytes .couch size (before compact): 1071 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 214.2 bytes .couch size / doc (after compact): 1656.2 bytes doc count (before compact): 6 doc count (after compact): 6 insert time: 0.0019 sec insert time / doc: 0.32 ms compact time: 1.0051 sec compact time / doc: 167.52 ms disk size (before compact): 1261 bytes disk size (after compact): 8281 bytes .couch size (before compact): 1261 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 210.17 bytes .couch size / doc (after compact): 1380.17 bytes doc count (before compact): 7 doc count (after compact): 7 insert time: 0.0021 sec insert time / doc: 0.3 ms compact time: 1.005 sec compact time / doc: 143.57 ms disk size (before compact): 1459 bytes disk size (after compact): 8281 bytes .couch size (before compact): 1459 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 208.43 bytes .couch size / doc (after compact): 1183 bytes doc count (before compact): 8 doc count (after compact): 8 insert time: 0.0022 sec insert time / doc: 0.27 ms compact time: 1.0049 sec compact time / doc: 125.61 ms disk size (before compact): 1655 bytes disk size (after compact): 8281 bytes .couch size (before compact): 1655 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 206.88 bytes .couch size / doc (after compact): 1035.13 bytes doc count (before compact): 9 doc count (after compact): 9 insert time: 0.0023 sec insert time / doc: 0.25 ms compact time: 1.0053 sec compact time / doc: 111.7 ms disk size (before compact): 1849 bytes disk size (after compact): 8281 bytes .couch size (before compact): 1849 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 205.44 bytes .couch size / doc (after compact): 920.11 bytes doc count (before compact): 10 doc count (after compact): 10 insert time: 0.0025 sec insert time / doc: 0.25 ms compact time: 1.0042 sec compact time / doc: 100.42 ms disk size (before compact): 2043 bytes disk size (after compact): 8281 bytes .couch size (before compact): 2043 bytes .couch size (after compact): 8281 bytes .couch size / doc (before compact): 204.3 bytes .couch size / doc (after compact): 828.1 bytes doc count (before compact): 50 doc count (after compact): 50 insert time: 0.0072 sec insert time / doc: 0.14 ms compact time: 1.0038 sec compact time / doc: 20.08 ms disk size (before compact): 10319 bytes disk size (after compact): 16473 bytes .couch size (before compact): 10319 bytes .couch size (after compact): 16473 bytes .couch size / doc (before compact): 206.38 bytes .couch size / doc (after compact): 329.46 bytes doc count (before compact): 100 doc count (after compact): 100 insert time: 0.0136 sec insert time / doc: 0.14 ms compact time: 1.0054 sec compact time / doc: 10.05 ms disk size (before compact): 20430 bytes disk size (after compact): 24665 bytes .couch size (before compact): 20430 bytes .couch size (after compact): 24665 bytes .couch size / doc (before compact): 204.3 bytes .couch size / doc (after compact): 246.65 bytes doc count (before compact): 500 doc count (after compact): 500 insert time: 0.0687 sec insert time / doc: 0.14 ms compact time: 1.0062 sec compact time / doc: 2.01 ms disk size (before compact): 104616 bytes disk size (after compact): 110690 bytes .couch size (before compact): 104616 bytes .couch size (after compact): 110690 bytes .couch size / doc (before compact): 209.23 bytes .couch size / doc (after compact): 221.38 bytes doc count (before compact): 1000 doc count (after compact): 1000 insert time: 0.1361 sec insert time / doc: 0.14 ms compact time: 1.003 sec compact time / doc: 1 ms disk size (before compact): 212260 bytes disk size (after compact): 217186 bytes .couch size (before compact): 212260 bytes .couch size (after compact): 217186 bytes .couch size / doc (before compact): 212.26 bytes .couch size / doc (after compact): 217.19 bytes doc count (before compact): 2500 doc count (after compact): 2500 insert time: 0.4686 sec insert time / doc: 0.19 ms compact time: 1.006 sec compact time / doc: 0.4 ms disk size (before compact): 814957 bytes disk size (after compact): 819298 bytes .couch size (before compact): 814957 bytes .couch size (after compact): 819298 bytes .couch size / doc (before compact): 325.98 bytes .couch size / doc (after compact): 327.72 bytes doc count (before compact): 5000 doc count (after compact): 5000 insert time: 0.9165 sec insert time / doc: 0.18 ms compact time: 1.0065 sec compact time / doc: 0.2 ms disk size (before compact): 2012394 bytes disk size (after compact): 2015330 bytes .couch size (before compact): 2012394 bytes .couch size (after compact): 2015330 bytes .couch size / doc (before compact): 402.48 bytes .couch size / doc (after compact): 403.07 bytes doc count (before compact): 7500 doc count (after compact): 7500 insert time: 1.5116 sec insert time / doc: 0.2 ms compact time: 2.0112 sec compact time / doc: 0.27 ms disk size (before compact): 3778774 bytes disk size (after compact): 3797090 bytes .couch size (before compact): 3778774 bytes .couch size (after compact): 3797090 bytes .couch size / doc (before compact): 503.84 bytes .couch size / doc (after compact): 506.28 bytes doc count (before compact): 10000 doc count (after compact): 10000 insert time: 2.3111 sec insert time / doc: 0.23 ms compact time: 3.015 sec compact time / doc: 0.3 ms disk size (before compact): 5653905 bytes disk size (after compact): 5652578 bytes .couch size (before compact): 5653905 bytes .couch size (after compact): 5652578 bytes .couch size / doc (before compact): 565.39 bytes .couch size / doc (after compact): 565.26 bytes doc count (before compact): 25000 doc count (after compact): 25000 insert time: 6.8684 sec insert time / doc: 0.27 ms compact time: 7.0746 sec compact time / doc: 0.28 ms disk size (before compact): 20595235 bytes disk size (after compact): 20635746 bytes .couch size (before compact): 20595235 bytes .couch size (after compact): 20635746 bytes .couch size / doc (before compact): 823.81 bytes .couch size / doc (after compact): 825.43 bytes doc count (before compact): 50000 doc count (after compact): 50000 insert time: 15.8227 sec insert time / doc: 0.32 ms compact time: 14.1612 sec compact time / doc: 0.28 ms disk size (before compact): 51808040 bytes disk size (after compact): 51724386 bytes .couch size (before compact): 51808040 bytes .couch size (after compact): 51724386 bytes .couch size / doc (before compact): 1036.16 bytes .couch size / doc (after compact): 1034.49 bytes doc count (before compact): 100000 doc count (after compact): 100000 insert time: 35.3071 sec insert time / doc: 0.35 ms compact time: 33.4723 sec compact time / doc: 0.33 ms disk size (before compact): 125497442 bytes disk size (after compact): 125419618 bytes .couch size (before compact): 125497442 bytes .couch size (after compact): 125419618 bytes .couch size / doc (before compact): 1254.97 bytes .couch size / doc (after compact): 1254.2 bytes doc count (before compact): 250000 doc count (after compact): 250000 insert time: 104.0009 sec insert time / doc: 0.42 ms compact time: 97.3738 sec compact time / doc: 0.39 ms disk size (before compact): 394489375 bytes disk size (after compact): 394457190 bytes .couch size (before compact): 394489375 bytes .couch size (after compact): 394457190 bytes .couch size / doc (before compact): 1577.96 bytes .couch size / doc (after compact): 1577.83 bytes doc count (before compact): 500000 doc count (after compact): 500000 insert time: 230.6021 sec insert time / doc: 0.46 ms compact time: 209.0139 sec compact time / doc: 0.42 ms disk size (before compact): 900271866 bytes disk size (after compact): 900280422 bytes .couch size (before compact): 900271866 bytes .couch size (after compact): 900280422 bytes .couch size / doc (before compact): 1800.54 bytes .couch size / doc (after compact): 1800.56 bytes doc count (before compact): 750000 doc count (after compact): 750000 insert time: 354.7959 sec insert time / doc: 0.47 ms compact time: 380.9895 sec compact time / doc: 0.51 ms disk size (before compact): 1446452532 bytes disk size (after compact): 1445376102 bytes .couch size (before compact): 1446452532 bytes .couch size (after compact): 1445376102 bytes .couch size / doc (before compact): 1928.6 bytes .couch size / doc (after compact): 1927.17 bytes doc count (before compact): 1000000 doc count (after compact): 1000000 insert time: 487.3284 sec insert time / doc: 0.49 ms compact time: 570.2633 sec compact time / doc: 0.57 ms disk size (before compact): 2023280441 bytes disk size (after compact): 2022334566 bytes .couch size (before compact): 2023280441 bytes .couch size (after compact): 2022334566 bytes .couch size / doc (before compact): 2023.28 bytes .couch size / doc (after compact): 2022.33 bytes
From this data, a few assumptions can be made:
- CouchDB inserts ~2-3k documents / second in a >100k documents database (for this particular hardware / benchmark setup)
- CouchDB inserts get slower on bigger databases
- CouchDB seems to use more bytes / document the larger the database gets (this is scary, but might explain the previous 2 observations)
- The time it takes for compacting a database with identical, unmodified documents seems to be almost equal to the time it took to insert the initial documents. Makes lots of sense assuming the writes are I/O bound.
Now since I am new to CouchDB there is a very large chance for problems with my setup and the logic behind my assumptions. However, I hope that by sharing them I can get feedback to make the benchmarks better and provide explanations for the observed characteristics.
I also hope that some of you might feel compelled to fork the project on GitHub and provide more benchmarks. Personally I am going to work on analyzing replication next. If I find time I'll also add some CSV exports and pretty rendering facilities with some google charts.
Comment if you have any thoughts, want to see more tests or share some religious propaganda related to your key-value storage system of choice : ).
-- Felix Geisendörfer aka the_undefined
PS: A few questions I can already see coming up:
Why does compact always take at least 1 second?
Because I use a while() loop with sleep(1) to determine when compact is done. I could check more frequently, but its not really a variable I'm interested in.
Why does compact increase the file size for < 50000 documents?
Good question, I have no idea. Anybody?
What insert method is used?
Have a look at the benchmark source. Basically it's bulk-inserts of 1000 items at a time with pre-generated UUIDs.
I ran the benchmark and got some additional output
Yeah, I removed some debugging / activity indicators to make the results more readable for this article.
You can skip to the end and add a comment.
Great work Felix! We need more of this.
Because our UUIDs are completely random, it's actually the worst case scenario for document creation, as we have to update the by_id index in random spots. If you want to have the fastest possible insert times, you should give the _id's ascending values, so get a UUID and increment it by 1, that way it's always inserting in the same place in the index, and being cache friendly once you are dealing with files larger than RAM. For an easier way to do the same thing, just sequentially number the documents but make it fixed length with padding so that they sort correctly, "0000001" instead of "1" for example.
The reason you see the amount the file grows to vary, and generally increase, is because of the btree by_id and by_seq indexes, who's update cost per document are LOG(N). So the slow growth rate increase is expected.
The reason the compaction increases the file size might be becauase you are doing larger transactions instead than the compaction does, which works by reading some data and filling a buffer, and then writing it and it's indexes out. If that buffer is smaller than your buld transactions, it will compact larger than the original file.
Thanks for posting that video Damien Katz video, Felix. That was pretty damn inspirational.
Hi! Thanks for the script! I tried out your benchmark with MongoDB, and put the results up at www.snailinaturtleneck.com/blog/?p=74, if anyone's interested. (The short summary: MongoDB was a lot faster.)
For people using other DB's: feel free to send me your results so I can add them to my comparison graph!
This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.
Felix, I read this article with some interest. I'm using Amazon's EC2 for an application we are currently developing where fast DB inserts are key. The obvious candidate is S3 but I has also been considering CouchDB. It would be interesting to see how CouchDB performance stacks up against S3.