CouchDB Weekly News, October 02, 2014

Major Discussions

Are attachments duplicated for each revision as well? (see thread)

Question: Given that attachments are seemingly stored as key/value pairs within a document, does that mean that each revision of a document contains the
attachments as well?  Or are they stored independently?

Answers:

  • The attachment will be stored once and each revision will retain a reference to that attachment (including when it was added, called revpos, so replication should be efficient too). Compaction will copy the attachments over and should retain a single copy for each unique attachment.
  • Attachments are identified by name and can be replaced without mutating old references to documents with attachments of the same name. If users pass the _attachments section and leave out stubs for any existing attachments, that is interpreted as a delete.
  • This means that users have a version of each attachment at a given revpos. So this means that they can replace or delete the attachment, but old revisions will reference them until they are culled by compaction.
  • If users have conflicting revisions they will both be able to keep different attachments under the same name. So if there are conflicts, users will possibly have more attachments. It’s possible to play around with this more by using the atts_since=N query parameter, but one should keep in mind that content is not currently deduplicated between different documents so that is where the application can do some work to ensure that only one of anything is stored by using a digest like SHA1 or similar as the document id. This artificially restricts one attachment per doc but I find things work a bit better when users avoid having huge numbers of attachments to manage per document.

How to store the delta between doc revisions? (see thread)

Question: A user asked about the best way to store a history of changes for a document. Originally, they were thinking what makes the most sense is to use the update function of CouchDB but not entirely sure if they can. Is there some way to use the update function and modify/create a second document in
the process? They now want to create a “history_log” document, where they can just store the delta between documents (as a patch, for example).

Answers:

  • Storing patches is good until users are sure that no single patch will
    get suddenly deleted. Otherwise they could easily find all their history
    broken.
  • Storing full document copies per revision is the more solid solution for such case: users can easily skip or lose one or several revisions and be fine, but it also consumes much more disk space.
  • It’s possible to create the validate_doc_update function which will verify that every new stored item contains some specific data (like previous document version to which validate_doc_update also has access), but all this leads to storing history logs inside a single document. If users want to track it separately, changes feed and update_notification_handler can be used, but then race conditions could happen (especially if compaction gets triggered), so there will be always a chance miss some revision.

Releases in the CouchDB Universe

  • The CouchDB Replication Protocol is now renewed and ready for use. It was designed to describe every single protocol bit to let everyone
    create their own replicator implementation without needing to read the source code or similar to understand how it works in every certain case. Ideally, you should open only this single page and be able to
    implement your CouchDB compatible replicable solution with zero
    experience with the CouchDB API. If you failed or found some bits missing – feel free to send PR with fixes!

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

No public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Can you help with Web Design, Development or UX for our Admin Console? No Erlang skills required! – Get in touch with us.
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you!

Events

Time to relax!

  • “Our early design was the problem. It was designed for right-handed users, but phones are usually rotated 180 degrees when held in left hands. Without realizing it, we’d created an app that worked best for our almost exclusively right-handed developer team.”You don’t know what you don’t know: how our unconscious minds undermine the workplace
  • “I am not anti-criticism of the applications we use, but I am suggesting we be mindful of how this discussion takes place. People make these things we use every day. Show them the same love and empathy you would want on launch day.” – To end all rage tweets

… and also in the news

CouchDB Weekly News, September 25, 2014

Major Discussions

Is there a general rule to estimate the maximum disk space for a CouchDB database? (see thread)

Question: A user asked if there’s a way to roughly estimate the maximum disk space needed for a CouchDB database.

Answers:

  • That’s extremely hard to tell. If users update each document individually, they need more disk space than if they use bulk updates on multiple documents at once (until compaction runs).
  • Users also need to take account for the views and their compaction (there was an example by another user of a view of 6,6GB before and 187MB size after compaction).
  • It may also depend on how good the data compresses.
  • Example: a CouchDB with 1000 documents of 1kb size added in the beginning of each day, that updates every hour. So for single day there’d be 1MB of initial data plus 23MB from revisions. 24MB per day results in 720 MB per average month and 8.7 GB per year. However, for each day 23MB of overhead data for previous revisions are accumulated, which could be cleaned up during compaction. So if users run compaction at the end of day, 1MB per day will be the grow rate.
  • It’s lso worth to take into account that while CouchDB cleans up old revisions on compaction, it doesn’t removes their ids from the document to preserve its history: this will give users also small size overhead on top, but no significant one.
  • Additionally it’s worth to know that CouchDB grows db by 4KiB chunks, no matter if the stored document is even smaller then, and that the file system may preallocate more space for a file than it actually contains.

Setting up CouchDB 2.0 (see thread)

Since the community is getting close to feature completeness, the topic of setting up CouchDB 2.0 solo and as a cluster was brought up and is now being discussed. Starting what is being envisioned (see initial email), this needs to be squared against the current implementation, required engineering and security considerations. You’ll find the complete discussion here.

Physical database movement without shutting down CouchDB (see thread)

Question: Is there any way to move database files physically without shutting down the whole CouchDB engine?

Answer: One approach could be to use mdadm with the “build” option to mirror the current device to the new device, remove the old device from the mirror and then perform a resize operation on the filesystem of the new device to gain access to the extra storage. All those steps may be doable online.

Learning a lot about and getting into CouchDB (see thread)

Question: Which ways are there for getting into CouchDB and learning more about it?

Answer: These are some of the main resources for learning more about CouchDB:

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

No public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you!

Events

Time to relax!

  • “We all have moments that change the way we think, the way that we look at the world, the things we want to do with our lives. On July 20, 1969 a whole generation of Americans had one of those transforming experiences: Two men landed on the Moon and nothing was ever the same again. Why did we go to the Moon? How did we get there? What was it like to witness it all? And what does any of this have to do with writing software 40 years later?”Russ Olsen: Going to the Moon (talk video)
  • “We as members of Open Source Communities have to implement a culture where mental health issues are not stigmatized, … a culture in which people are heard and they know that there are people who care for them.” – This is bigger than us: Building a future for Open Source (presentation slides with notes)

… and also in the news