CouchDB Weekly News, September 11, 2014

Major Discussions

Is it possible to measure disk space of documents in a view? (see thread)

Question: is there a way to measure the amount of disk space used by the documents returned by a particular view? As an example, let’s say all documents in a user’s database are tagged with a userId and the user has a view that returns all documents by userId. Now the user would want to measure the amount of space used by userId=100. Is this possible?

Answer 1:

  • A possible approach would be by finding some reliable / good-enough way to calculate an arbitrary document’s disk size and emitting such value in the map function (among whatever else is already being emitted), and then calculating that specific sum on reduce. It’s hard to tell about attachments though.
  • In any case, users would also have to account for the append-only way of
    things, which will incur a overhead of up to whatever the compaction
    threshold is (the respondent’s databases and views usually grow to around twice their real size before they’re compacted).

Answer 2:

  • It’s not hard to walk a JSON object recursively and compute how many bytes it would probably occupy as JSON. Then users would need to add in the encoded_length of each attachment.
  • But after this, users run into implementation details like: are document bodies stored using some kind of compression (like Snappy)? Are they even stored as JSON at all, vs. serialized Erlang terms? And what about conflicts — if a doc is in conflict, users really need to add up the size of each conflicting revision, but can’t access the other revisions from a map function.

Manually deleting a _design directory (see thread)

Question: A user has a CouchDB instance which they are no longer using for data processing. That is, they need the data to be there, but the views are no longer needed, since they have moved the data processing to another server. Now the user would like to free the space used by the views (currently nearly 5 GB) and want to know:

  1. Can they  simply delete the design directories? (rm -rf .*_design)
  2. Will this affect the documents themselves?
  3. Is it possible to do this in a running CouchDB instance?
  4. Will this really free-up disk space, or does CouchDB keep view file
    handles open, so that a restart is needed?

The user is aware that doing so will still leave the _design documents in the databases, and triggering those documents will recreate the views, but this is no problem at the moment, the user just wants to free-up some disk space quickly.

Answer:

  • It is possible to delete these files, but it’s not recommended since CouchDB process may still hold file descriptors opened on these files. It’s better to delete design document in CouchDB and cleanup outdated views.
  • With deletion .*_design, users remove view indexes. If a design document
    still exists in the database and CouchDB doesn’t keep fd on any of these
    files, it will rebuild the index from scratch on the next request to a view.
  • If this is being done while having CouchDB in run, users need to restart it to let it release file descriptors. Or they can find fd in /proc/<couchdbpid>/fd and close them via gdb.

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

Q&A: How does CouchDB store my items?

Question: How does CouchDB store all my items?

Answer: CouchDB’s storage engine is based on the venerable B-tree, now used in and the on-disk format is appended to on each update. There are some excellent posts out there with a lot more detail:

For B-trees in general and some interesting variants, read up on:

More Q&A from this week

No public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you!

New Committer

  • Jenn Schiffer (IRC nic: jennmoneydollars; Twitter; Apache ID: jenn) has been elected as a CouchDB committer (see thread). Welcome to CouchDB, Jenn!

Events

Time to relax!

  • “There’s no stress that can’t be eased by the cuddles of an office dog.” – 5 reasons you should get an office dog
  • “We can take some credit for why the event was so successful, but I also believe that a lot of great things happened by chance and in my experience it’s the things that you didn’t plan for that teaches you the most. So now I’ll tell you what we did and what we learned.” – How to create a tech event where everyone feels welcome

… and also in the news

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s