CouchDB Weekly News, September 11, 2014

Major Discussions

Is it possible to measure disk space of documents in a view? (see thread)

Question: is there a way to measure the amount of disk space used by the documents returned by a particular view? As an example, let’s say all documents in a user’s database are tagged with a userId and the user has a view that returns all documents by userId. Now the user would want to measure the amount of space used by userId=100. Is this possible?

Answer 1:

  • A possible approach would be by finding some reliable / good-enough way to calculate an arbitrary document’s disk size and emitting such value in the map function (among whatever else is already being emitted), and then calculating that specific sum on reduce. It’s hard to tell about attachments though.
  • In any case, users would also have to account for the append-only way of
    things, which will incur a overhead of up to whatever the compaction
    threshold is (the respondent’s databases and views usually grow to around twice their real size before they’re compacted).

Answer 2:

  • It’s not hard to walk a JSON object recursively and compute how many bytes it would probably occupy as JSON. Then users would need to add in the encoded_length of each attachment.
  • But after this, users run into implementation details like: are document bodies stored using some kind of compression (like Snappy)? Are they even stored as JSON at all, vs. serialized Erlang terms? And what about conflicts — if a doc is in conflict, users really need to add up the size of each conflicting revision, but can’t access the other revisions from a map function.

Manually deleting a _design directory (see thread)

Question: A user has a CouchDB instance which they are no longer using for data processing. That is, they need the data to be there, but the views are no longer needed, since they have moved the data processing to another server. Now the user would like to free the space used by the views (currently nearly 5 GB) and want to know:

  1. Can they  simply delete the design directories? (rm -rf .*_design)
  2. Will this affect the documents themselves?
  3. Is it possible to do this in a running CouchDB instance?
  4. Will this really free-up disk space, or does CouchDB keep view file
    handles open, so that a restart is needed?

The user is aware that doing so will still leave the _design documents in the databases, and triggering those documents will recreate the views, but this is no problem at the moment, the user just wants to free-up some disk space quickly.

Answer:

  • It is possible to delete these files, but it’s not recommended since CouchDB process may still hold file descriptors opened on these files. It’s better to delete design document in CouchDB and cleanup outdated views.
  • With deletion .*_design, users remove view indexes. If a design document
    still exists in the database and CouchDB doesn’t keep fd on any of these
    files, it will rebuild the index from scratch on the next request to a view.
  • If this is being done while having CouchDB in run, users need to restart it to let it release file descriptors. Or they can find fd in /proc/<couchdbpid>/fd and close them via gdb.

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

Q&A: How does CouchDB store my items?

Question: How does CouchDB store all my items?

Answer: CouchDB’s storage engine is based on the venerable B-tree, now used in and the on-disk format is appended to on each update. There are some excellent posts out there with a lot more detail:

For B-trees in general and some interesting variants, read up on:

More Q&A from this week

No public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you!

New Committer

  • Jenn Schiffer (IRC nic: jennmoneydollars; Twitter; Apache ID: jenn) has been elected as a CouchDB committer (see thread). Welcome to CouchDB, Jenn!

Events

Time to relax!

  • “There’s no stress that can’t be eased by the cuddles of an office dog.” – 5 reasons you should get an office dog
  • “We can take some credit for why the event was so successful, but I also believe that a lot of great things happened by chance and in my experience it’s the things that you didn’t plan for that teaches you the most. So now I’ll tell you what we did and what we learned.” – How to create a tech event where everyone feels welcome

… and also in the news

CouchDB Weekly News, September 04, 2014

Releases

Major Discussions

Vote: Release of Apache CouchDB 1.6.1-rc.4 (will be released as Apache CouchDB 1.6.1) (see thread)

The vote has passed, Apache CouchDB 1.6.1 has been released.

Website Refresh: updated “How to Contribute” section (see thread)

The section about ways to contribute to CouchDB has been updated. The goal was to better emphasise

  1. CouchDB’s focus on building a welcoming, supportive, inclusive, and
    diverse community (see also our recently published Code of Conduct and Diversity statement for background information) and
  2. our valuing of non-coding contributions.

If you want to help us develop CouchDB or support our marketing efforts, we invite you to check out the updated page and get in touch with us!

Is it possible to measure disk space of documents in a view? (see thread)

Question: Is there a way to measure the amount of disk space used by the documents returned by a particular view? As an example, let’s say all documents in my database are tagged with a userId and I have a view that returns all
documents by userId. Now I want to measure the amount of space used by
userId=100? Is this possible?

Answer:

  • One approach could be to find a reliable / sufficient way to calculate an arbitrary document’s disk size and emitting such value in the map
    function (among whatever else is already being emitted), and then
    calculating that specific sum on reduce. Though, it’s not clear if this works for attachments as well.
  • In any case, users also have to account for the append-only way of
    things, which will incur a overhead of up to whatever the compaction
    threshold is.

validate_doc_update design function (see thread)

Question: a user is writing a validate_doc_update function which they want to use for validating that document inserts comply with a strict schema. Since there is no way to pass parameters to the validate_doc_update function, they were thinking of fetching the schema (contained in a local JSON file) asynchronously and found they can request the schema once and then store it. So, there would be one initial performance hit in fetching the file, and from then on it would be saved. Question is whether there would be a better way to do this.

Answer 1:

  • There is no way to either act asynchronously in validate_doc_update functions or to reliably cache some data for other runs. In this case it sounded like the user is using a glitch for your cache, it’s not recommended to rely on that.
  • If users don’t want to update the function itself there is an easy way to include the schema directly into the design document as part of design document JSON, and this can be required during validate_doc_update.
  • The reasons for this:
    • This caching behavior is not intended at all, and
    • Validate_doc_update responds sooner (synchronously) than the schema is fetched, so validation always succeeds in the first run.

Answer 2:

  • It’s also possible is to have the schema in your design document, like it is
    done here. The design functions are executed in the scope of the design document (although there may be exceptions), so “this” is the design document. So the couchapp-schema can use this.schema[newDoc.schema] to get the schema.
  • Users would then still have to update the design document, but they don’t have to touch the validation function.

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

no public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you!

New Committer

  • Sebastian Rothbucher (IRC nic: sebastianrothbuc; Twitter; Apache ID: sebastianro) has been elected as a CouchDB committer. Welcome to CouchDB, Sebastian!

Events

Time to relax!

  • “I am not awesome. I am not a rockstar, superstar, ninja, or guru either. The ‘cult of awesome’ that too frequently crops up in the technology industry is a problem and it breeds a culture of egos and aggrandising that can be self defeating.” – Not everything is awesome
  • “I think I realised after a while that what I saw in other people who I thought had a plan was passion. Perhaps the passion I didn’t know how to direct earlier on. I don’t believe we need plans, we need something that drives us, something that gives us that feeling in our gut that we maybe can’t explain.” – Plans
  • 50 people were asked to walk into a room filled with balloons. This is what happened next
  • Copy Paste Soul: Music to … Emotivate (around 24:00 is a good spot)

… and also in the news