CouchDB Weekly News, July 24, 2014

Major Discussions

Vote (ongoing): Official CouchDB Bylaws (see thread)

The vote on the official CouchDB bylaws started on Monday, July 21 (see initial email). According to feedback, the bylaws were updated on July 22, it’s now being voted on this revised, current version of the bylaws and the vote is still in progress. Binding votes can be casted by CouchDB Project Management Committee (PMC) members. Still, as this is the first vote of its kind, and this document is foundational to our decision-making process in the future, the CouchDB PMC is asking all active committers to cast voluntary, non-binding votes as well. All votes must be received by 23:59:59 UTC on Sunday, July 27, 2014.

Progress in work on Fauxton (see GitHub)

The work on the Fauxton implementation is going on and making good progress in preparation for CouchDB 2.0. It has just been merged in the new Fauxton sidebar redesign:

If you want to take a closer look: you’ll find the discussion and screenshots here.

Optimizing views (see thread)

Question: a CouchDB user has a simple user document containing fields such as gender, age and dates. They want to request these documents by age, or gender, or dates or a mix of these criteria. Is it more efficient to create a single view containing several emits, or to create several views, each of them containing one emit statement?

Answers:

  • Creating several views may be better, since each one will be slightly faster to search.
  • Also, with one view, it can be easy to get mixed up between different types of keys.
  • Multiple views also allows to split them across design documents, which means they can be built independently and in parallel.
  • Regarding the specific use case described in the question, this user falls in the field of multi-dimensional queries, which CouchDB isn’t suited for. The solution generally is to use couchdb-lucene or even elasticsearch or something similar.

How many views per desgin doc in CouchDB? (see thread)

Question: Is it better to define one view per design doc? If I have e.g. 8 views for a given person design doc, would they better be placed in one single doc or breaked into smaller units?

Answers:

  • The answer heavily depends on the specific use case. In general, one can
    split views that are slightly slower to process into separate design documents so that they may be run in parallel.  In the case of a query that
    requires a (re)build (e.g. no stale=ok or update_after), then this also
    means that one only has to wait for the one view to finish.
  • The smaller unit will save a lot of trouble, especially when updating the code of large views.
  • On the other hand, if the views are re-indexed independently, then each task is reading the documents out of the database. Whereas if the views are in the same design document, the documents only get read once and then passed to each view’s map function. This may be faster, but also depends on details of the view engine used.

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

no public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you!

Events

Job opportunities for people with CouchDB skills

Time to relax!

… and also in the news

CouchDB Weekly News, July 17, 2014

Weekly CouchDB meeting – (summary)

  • BigCouch / DBCore Merge Status: The work on the merge is being continued. Master can now run clustered CouchDB and wider testing is occuring (and further encouraged!). Thus, everyone is asked to test CouchDB Master (see this thread for details).
  • Breaking changes / backward compatibility in CouchDB 2.0: this topic has been brought up on the mailing list (see thread), you’re encouraged to post your ideas now so they can be given ample consideration.
  • Bylaws / Code of Conduct: the next iteration has been sent to the mailing list (see thread). Please read through the draft and send your feedback to dev@ mailing list. The formal vote will start on Monday, July 21.

Major Discussions

Updated Bylaws – final Readthrough before Vote starting on Monday, July 21 (see thread)

Based on previous comments, the bylaws have been updated (see changes), you can find the current version here. At this point, the bylaws are mostly stable. A formal vote on these bylaws is called as of Monday, July 21. Thus, please take the time to make a final read-through and address any corrections before then. As this will be a non-technical vote, all active committers will be asked to cast their vote at that time.

Control view query performance (see thread)

Questions: when a doc needs to be calculated for a particular set of entries from the changes feed, are docs sent one at a time or in batches? And is there
just one view server doing all of the computation? Is there a way to configure or control any of the settings that would dictate the above?

Answer: CouchDB manages a pool of couchjs processes, any one of which is capable of evaluating Javascript map/reduce functions. A single view group (all views in the same ddoc) is indexed serially using a single couchjs process (which means, there is no parallelism here). In CouchDB 2.0, of course, a database can, and typically does, consist of multiple shards, introducing parallelism to view builds among other effects.

Guaranteed reliability and durability in CouchDB (see thread)

Questions: What level of guaranteed reliability and durability does CouchDB provide? Are there any corner cases in which users can lose their data? Are there any cases where users receive an acknowledgement and still data is lost?

Answers:

  • As long as CouchDB users follow the official recommendation to set delayed_commits to false (see this link to configuration), CouchDB guarantees that a success for a write operation is only indicated if it has received a ok return code from a fsync call at the end of the particular write operation.
  • Users should note that delayed_commits currently still defaults to true, so they need to change it to false in order to have that guarantee.
  • In CouchDB 2.0, delayed_commits‘ will default to false, the safe setting.
  • With delayed_commits set to false, fsync is guaranteed to have been called before the http response to the write is sent, but this is not the same as ‘every write will be immediately flushed to disk’. Concurrent writes will be fsynced together.
  • With delayed_commits set to ‘true’, fsync is called once per second, which indeed is an opportunity for data loss. The client will receive the acknowledge before the fsync call is made.

Additional question: Does CouchDB not use any journalling/write-ahead logging like MongoDB does, so that journal files could be written more frequently than the actual data file?

Answer: A CouchDB database is a write-ahead journal by design, it’s opened with O_APPEND, all writes are strictly to the end of the file. Updates become visible after a new database header is written fully to disk (we can detect partial header writes, in that event we seek backward for the previous one).

In CouchDB 2.0 (and BigCouch, obviously), we keep three copies of each document, on separate nodes. A write will attempt to update all three copies in parallel and respond to the client when two of these acknowledge the write (as noted, this will be after the fsync call). In that event, a 201 code is returned. In the case that, for whatever reason, only one write occurred, we return a 202 code as an indication that the write, while persistent on a disk, is not stored redundantly yet. Every write to a copy of a document triggers an internal healing mechanism to ensure it reaches every other replica. Thus, even a 202 will graduate to a 201 internally when the nodes are available again. This mechanism is exactly the same mechanism as replication between two databases (aka, it reuses the MVCC power of CouchDB to ensure eventual consistency between replicas).

Are CouchDB restarts graceful? (see thread)

Question: are CouchDB restarts graceful? Or are all current requests immediately terminated?

Answers:

  • CouchDB deliberately has no graceful shutdown code, the server process is simply killed. It is planned for this by ensuring all data is durably on disk before responding to http writes.
  • By only having one way to fail, we are always executing the ‘recovery’ path, it’s never an afterthought.
  • CouchDB’s disk structures do not require a validation on startup after regular or irregular server shutdown.
  • In 2.0, it will be possible to put each node of the cluster into maintenance mode, which is pretty close to “graceful restart” (notably, though, the nodes in maintenance mode will continue to perform internal housekeeping tasks, which includes disk writes).

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

no public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • Help test BigCouch file migration for .couch files with your personal databases – you’ll find all details on the testing procedure here
  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you!

Events

Job opportunities for people with CouchDB skills

Time to relax!

Editors’ note: This time we introduce a new, additional section according to CouchDB’s slogan, giving you one or two links per week about relaxing, making your or other people’s lives easier and more. – The CouchDB Marketing Team

  • “Personally, I know I need to step up my game with respect to providing positive feedback, and sometimes it takes just a smidgen of positivity directed toward you to hammer that truth home.”High Fivery

… and also in the news