debuggable

 
Contact Us
 

CouchDB Insert Benchmarks

Posted on 25/6/09 by Felix Geisendörfer

Hey folks,

I am currently working on replacing Amazon S3 as the key value storage service for Debuggable's new startup. The main reason for that choice is that we want the ability to license the technology for in-house usage, which means the S3 dependency has to go.

Over the past year or so I have repeatedly heard good things about CouchDB (not to mention that Damien Katz became a personal hero of mine after seeing this video). But the main reason for preferring CouchDB over all the alternative key value stores is CouchDB's simplicity. Using HTTP + JSON as the protocol and embracing complete RESTfulness makes getting started with CouchDB ridiculously easy. It also makes it very awesome as you can build your architecture with other HTTP tools such as reverse proxies, http load balancers, etc..

Anyway, having done absolutely nothing with CouchDB before, I feel I should avoid shooting myself in the foot by making poor performance / scaling assumptions that may or may not be true for our particular use cases. So I did what every new user of an open source project would should do and setup some benchmarks.

However, the whole point of these is not to find out how fast CouchDB really is - I don't care as long as it scales horizontally. No, I am mostly interested in finding out 2 things:

  • a) How much slacking time do I have left before we seriously need to scale out to multiple CouchDB nodes
  • b) When that day comes, what can we expect in terms of replication delay (eventual consistency) in a multi-master setup

So far I have been mostly studying question a). Since our application is mostly storage heavy, but does not necessarily have lots of read hits (see, this is the point where you should realize my tests might not help answer your questions at all - sorry), I was wondering about the disk space consumption relative to the number of documents stored in the database.

After reading a bit about the B+ tree CouchDB uses for storage, I assumed that the required disk space would grow linear with the number of documents stored. However, my initial tests indicated that the disk space / document was growing with each document I was adding. Assuming a bad setup and after discussing this a bit in #couchdb, I decided to create a more serious setup on Amazon Ec2 inserting anywhere from 0 - 1 million records.

I put all the code for setting up the environment and running the tests on GitHub, see my couchdb-benchmarks project.

Anyway you waited long enough. Time for my initial results:

doc count (before compact): 0
doc count (after compact): 0
insert time: 0 sec
insert time / doc: n/a ms
compact time: 1.0064 sec
compact time / doc: n/a ms
disk size (before compact): 79 bytes
disk size (after compact): 79 bytes
.couch size (before compact): 79 bytes
.couch size (after compact): 79 bytes
.couch size / doc (before compact): n/a bytes
.couch size / doc (after compact): n/a bytes

doc count (before compact): 1
doc count (after compact): 1
insert time: 0.0015 sec
insert time / doc: 1.46 ms
compact time: 1.0051 sec
compact time / doc: 1005.14 ms
disk size (before compact): 315 bytes
disk size (after compact): 4179 bytes
.couch size (before compact): 315 bytes
.couch size (after compact): 4179 bytes
.couch size / doc (before compact): 315 bytes
.couch size / doc (after compact): 4179 bytes

doc count (before compact): 2
doc count (after compact): 2
insert time: 0.0015 sec
insert time / doc: 0.75 ms
compact time: 1.0057 sec
compact time / doc: 502.87 ms
disk size (before compact): 503 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 503 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 251.5 bytes
.couch size / doc (after compact): 4140.5 bytes

doc count (before compact): 3
doc count (after compact): 3
insert time: 0.0017 sec
insert time / doc: 0.56 ms
compact time: 1.0053 sec
compact time / doc: 335.1 ms
disk size (before compact): 693 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 693 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 231 bytes
.couch size / doc (after compact): 2760.33 bytes

doc count (before compact): 4
doc count (after compact): 4
insert time: 0.0017 sec
insert time / doc: 0.44 ms
compact time: 1.0053 sec
compact time / doc: 251.33 ms
disk size (before compact): 883 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 883 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 220.75 bytes
.couch size / doc (after compact): 2070.25 bytes

doc count (before compact): 5
doc count (after compact): 5
insert time: 0.0018 sec
insert time / doc: 0.36 ms
compact time: 1.0054 sec
compact time / doc: 201.08 ms
disk size (before compact): 1071 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 1071 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 214.2 bytes
.couch size / doc (after compact): 1656.2 bytes

doc count (before compact): 6
doc count (after compact): 6
insert time: 0.0019 sec
insert time / doc: 0.32 ms
compact time: 1.0051 sec
compact time / doc: 167.52 ms
disk size (before compact): 1261 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 1261 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 210.17 bytes
.couch size / doc (after compact): 1380.17 bytes

doc count (before compact): 7
doc count (after compact): 7
insert time: 0.0021 sec
insert time / doc: 0.3 ms
compact time: 1.005 sec
compact time / doc: 143.57 ms
disk size (before compact): 1459 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 1459 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 208.43 bytes
.couch size / doc (after compact): 1183 bytes

doc count (before compact): 8
doc count (after compact): 8
insert time: 0.0022 sec
insert time / doc: 0.27 ms
compact time: 1.0049 sec
compact time / doc: 125.61 ms
disk size (before compact): 1655 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 1655 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 206.88 bytes
.couch size / doc (after compact): 1035.13 bytes

doc count (before compact): 9
doc count (after compact): 9
insert time: 0.0023 sec
insert time / doc: 0.25 ms
compact time: 1.0053 sec
compact time / doc: 111.7 ms
disk size (before compact): 1849 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 1849 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 205.44 bytes
.couch size / doc (after compact): 920.11 bytes

doc count (before compact): 10
doc count (after compact): 10
insert time: 0.0025 sec
insert time / doc: 0.25 ms
compact time: 1.0042 sec
compact time / doc: 100.42 ms
disk size (before compact): 2043 bytes
disk size (after compact): 8281 bytes
.couch size (before compact): 2043 bytes
.couch size (after compact): 8281 bytes
.couch size / doc (before compact): 204.3 bytes
.couch size / doc (after compact): 828.1 bytes

doc count (before compact): 50
doc count (after compact): 50
insert time: 0.0072 sec
insert time / doc: 0.14 ms
compact time: 1.0038 sec
compact time / doc: 20.08 ms
disk size (before compact): 10319 bytes
disk size (after compact): 16473 bytes
.couch size (before compact): 10319 bytes
.couch size (after compact): 16473 bytes
.couch size / doc (before compact): 206.38 bytes
.couch size / doc (after compact): 329.46 bytes

doc count (before compact): 100
doc count (after compact): 100
insert time: 0.0136 sec
insert time / doc: 0.14 ms
compact time: 1.0054 sec
compact time / doc: 10.05 ms
disk size (before compact): 20430 bytes
disk size (after compact): 24665 bytes
.couch size (before compact): 20430 bytes
.couch size (after compact): 24665 bytes
.couch size / doc (before compact): 204.3 bytes
.couch size / doc (after compact): 246.65 bytes

doc count (before compact): 500
doc count (after compact): 500
insert time: 0.0687 sec
insert time / doc: 0.14 ms
compact time: 1.0062 sec
compact time / doc: 2.01 ms
disk size (before compact): 104616 bytes
disk size (after compact): 110690 bytes
.couch size (before compact): 104616 bytes
.couch size (after compact): 110690 bytes
.couch size / doc (before compact): 209.23 bytes
.couch size / doc (after compact): 221.38 bytes

doc count (before compact): 1000
doc count (after compact): 1000
insert time: 0.1361 sec
insert time / doc: 0.14 ms
compact time: 1.003 sec
compact time / doc: 1 ms
disk size (before compact): 212260 bytes
disk size (after compact): 217186 bytes
.couch size (before compact): 212260 bytes
.couch size (after compact): 217186 bytes
.couch size / doc (before compact): 212.26 bytes
.couch size / doc (after compact): 217.19 bytes

doc count (before compact): 2500
doc count (after compact): 2500
insert time: 0.4686 sec
insert time / doc: 0.19 ms
compact time: 1.006 sec
compact time / doc: 0.4 ms
disk size (before compact): 814957 bytes
disk size (after compact): 819298 bytes
.couch size (before compact): 814957 bytes
.couch size (after compact): 819298 bytes
.couch size / doc (before compact): 325.98 bytes
.couch size / doc (after compact): 327.72 bytes

doc count (before compact): 5000
doc count (after compact): 5000
insert time: 0.9165 sec
insert time / doc: 0.18 ms
compact time: 1.0065 sec
compact time / doc: 0.2 ms
disk size (before compact): 2012394 bytes
disk size (after compact): 2015330 bytes
.couch size (before compact): 2012394 bytes
.couch size (after compact): 2015330 bytes
.couch size / doc (before compact): 402.48 bytes
.couch size / doc (after compact): 403.07 bytes

doc count (before compact): 7500
doc count (after compact): 7500
insert time: 1.5116 sec
insert time / doc: 0.2 ms
compact time: 2.0112 sec
compact time / doc: 0.27 ms
disk size (before compact): 3778774 bytes
disk size (after compact): 3797090 bytes
.couch size (before compact): 3778774 bytes
.couch size (after compact): 3797090 bytes
.couch size / doc (before compact): 503.84 bytes
.couch size / doc (after compact): 506.28 bytes

doc count (before compact): 10000
doc count (after compact): 10000
insert time: 2.3111 sec
insert time / doc: 0.23 ms
compact time: 3.015 sec
compact time / doc: 0.3 ms
disk size (before compact): 5653905 bytes
disk size (after compact): 5652578 bytes
.couch size (before compact): 5653905 bytes
.couch size (after compact): 5652578 bytes
.couch size / doc (before compact): 565.39 bytes
.couch size / doc (after compact): 565.26 bytes

doc count (before compact): 25000
doc count (after compact): 25000
insert time: 6.8684 sec
insert time / doc: 0.27 ms
compact time: 7.0746 sec
compact time / doc: 0.28 ms
disk size (before compact): 20595235 bytes
disk size (after compact): 20635746 bytes
.couch size (before compact): 20595235 bytes
.couch size (after compact): 20635746 bytes
.couch size / doc (before compact): 823.81 bytes
.couch size / doc (after compact): 825.43 bytes

doc count (before compact): 50000
doc count (after compact): 50000
insert time: 15.8227 sec
insert time / doc: 0.32 ms
compact time: 14.1612 sec
compact time / doc: 0.28 ms
disk size (before compact): 51808040 bytes
disk size (after compact): 51724386 bytes
.couch size (before compact): 51808040 bytes
.couch size (after compact): 51724386 bytes
.couch size / doc (before compact): 1036.16 bytes
.couch size / doc (after compact): 1034.49 bytes

doc count (before compact): 100000
doc count (after compact): 100000
insert time: 35.3071 sec
insert time / doc: 0.35 ms
compact time: 33.4723 sec
compact time / doc: 0.33 ms
disk size (before compact): 125497442 bytes
disk size (after compact): 125419618 bytes
.couch size (before compact): 125497442 bytes
.couch size (after compact): 125419618 bytes
.couch size / doc (before compact): 1254.97 bytes
.couch size / doc (after compact): 1254.2 bytes

doc count (before compact): 250000
doc count (after compact): 250000
insert time: 104.0009 sec
insert time / doc: 0.42 ms
compact time: 97.3738 sec
compact time / doc: 0.39 ms
disk size (before compact): 394489375 bytes
disk size (after compact): 394457190 bytes
.couch size (before compact): 394489375 bytes
.couch size (after compact): 394457190 bytes
.couch size / doc (before compact): 1577.96 bytes
.couch size / doc (after compact): 1577.83 bytes

doc count (before compact): 500000
doc count (after compact): 500000
insert time: 230.6021 sec
insert time / doc: 0.46 ms
compact time: 209.0139 sec
compact time / doc: 0.42 ms
disk size (before compact): 900271866 bytes
disk size (after compact): 900280422 bytes
.couch size (before compact): 900271866 bytes
.couch size (after compact): 900280422 bytes
.couch size / doc (before compact): 1800.54 bytes
.couch size / doc (after compact): 1800.56 bytes

doc count (before compact): 750000
doc count (after compact): 750000
insert time: 354.7959 sec
insert time / doc: 0.47 ms
compact time: 380.9895 sec
compact time / doc: 0.51 ms
disk size (before compact): 1446452532 bytes
disk size (after compact): 1445376102 bytes
.couch size (before compact): 1446452532 bytes
.couch size (after compact): 1445376102 bytes
.couch size / doc (before compact): 1928.6 bytes
.couch size / doc (after compact): 1927.17 bytes

doc count (before compact): 1000000
doc count (after compact): 1000000
insert time: 487.3284 sec
insert time / doc: 0.49 ms
compact time: 570.2633 sec
compact time / doc: 0.57 ms
disk size (before compact): 2023280441 bytes
disk size (after compact): 2022334566 bytes
.couch size (before compact): 2023280441 bytes
.couch size (after compact): 2022334566 bytes
.couch size / doc (before compact): 2023.28 bytes
.couch size / doc (after compact): 2022.33 bytes

From this data, a few assumptions can be made:

  • CouchDB inserts ~2-3k documents / second in a >100k documents database (for this particular hardware / benchmark setup)
  • CouchDB inserts get slower on bigger databases
  • CouchDB seems to use more bytes / document the larger the database gets (this is scary, but might explain the previous 2 observations)
  • The time it takes for compacting a database with identical, unmodified documents seems to be almost equal to the time it took to insert the initial documents. Makes lots of sense assuming the writes are I/O bound.

Now since I am new to CouchDB there is a very large chance for problems with my setup and the logic behind my assumptions. However, I hope that by sharing them I can get feedback to make the benchmarks better and provide explanations for the observed characteristics.

I also hope that some of you might feel compelled to fork the project on GitHub and provide more benchmarks. Personally I am going to work on analyzing replication next. If I find time I'll also add some CSV exports and pretty rendering facilities with some google charts.

Comment if you have any thoughts, want to see more tests or share some religious propaganda related to your key-value storage system of choice : ).

-- Felix Geisendörfer aka the_undefined

PS: A few questions I can already see coming up:

Why does compact always take at least 1 second?
Because I use a while() loop with sleep(1) to determine when compact is done. I could check more frequently, but its not really a variable I'm interested in.

Why does compact increase the file size for < 50000 documents?
Good question, I have no idea. Anybody?

What insert method is used?
Have a look at the benchmark source. Basically it's bulk-inserts of 1000 items at a time with pre-generated UUIDs.

I ran the benchmark and got some additional output
Yeah, I removed some debugging / activity indicators to make the results more readable for this article.

 

You can skip to the end and add a comment.

Mark Boas said on Jun 25, 2009:

Felix, I read this article with some interest. I'm using Amazon's EC2 for an application we are currently developing where fast DB inserts are key. The obvious candidate is S3 but I has also been considering CouchDB. It would be interesting to see how CouchDB performance stacks up against S3.

Damien Katz said on Jun 25, 2009:

Great work Felix! We need more of this.

Because our UUIDs are completely random, it's actually the worst case scenario for document creation, as we have to update the by_id index in random spots. If you want to have the fastest possible insert times, you should give the _id's ascending values, so get a UUID and increment it by 1, that way it's always inserting in the same place in the index, and being cache friendly once you are dealing with files larger than RAM. For an easier way to do the same thing, just sequentially number the documents but make it fixed length with padding so that they sort correctly, "0000001" instead of "1" for example.

Damien Katz said on Jun 25, 2009:

The reason you see the amount the file grows to vary, and generally increase, is because of the btree by_id and by_seq indexes, who's update cost per document are LOG(N). So the slow growth rate increase is expected.

The reason the compaction increases the file size might be becauase you are doing larger transactions instead than the compaction does, which works by reading some data and filling a buffer, and then writing it and it's indexes out. If that buffer is smaller than your buld transactions, it will compact larger than the original file.

Steve Oliveira said on Jun 26, 2009:

Thanks for posting that video Damien Katz video, Felix. That was pretty damn inspirational.

Kristina said on Jun 29, 2009:

Hi! Thanks for the script! I tried out your benchmark with MongoDB, and put the results up at www.snailinaturtleneck.com/blog/?p=74, if anyone's interested. (The short summary: MongoDB was a lot faster.)

For people using other DB's: feel free to send me your results so I can add them to my comparison graph!

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.