CouchDB 2.0 Architecture

This is the third in a series of blog posts introducing the Apache CouchDB 2.0 release. Read part one: The Road to CouchDB 2.0 and part two: Fauxton, the new CouchDB Dashboard.

CouchDB has always anticipated clustering as a core feature and, with 2.0, it has finally landed.

We’ve followed the Dynamo model made famous by Amazon where a database is divided into a number of equal, but separate, pieces, which we refer to as shards. Any given document belongs to one shard, and this is determined directly from its ID (and only its ID). This arrangement means that any node in the CouchDB cluster knows exactly where any document is hosted, allowing for scalable reading and writing. In addition, CouchDB 2.0 keeps multiple copies of each shard, so that the loss of any individual node is not fatal.

When creating a database, you can specify the number of shards (with ?q=) and the number of copies of those shards (with ?n=) or use the defaults. The default N is 3 which is almost always the right value, fewer is too risky, greater is too expensive. The default Q is 8 and this is suitable for most uses. You are well advised to raise this number if your database will be large, or if you plan to increase the size of your cluster significantly over time. As a rule of thumb, aim for no more than 10 million documents per shard.

When a document is created, updated, deleted, or read, the node that processes the HTTP request (which can be any node in the cluster) spawns N processes that run in parallel, to attempt the desired operation at every copy of the document. The coordinating node will wait for N/2+1 responses before merging those responses as the HTTP response. This overlap helps to present a consistent view of the database, though that consistency is not guaranteed (CouchDB 2.0 is an Available/Partition-Tolerant system by design, we sacrifice Consistency for Availability).

All the usual CouchDB features work as normal with only minor changes
in some cases. The most noteworthy is the changes feed. The update
sequences that CouchDB 2.0 returns are now strings, not numbers, as it
encodes the numeric sequences of each shard of the database. You treat
them as you should always have treated the numeric sequence;
opaquely. Pass the update sequence value back to the since= parameter of _changes and all is well.

A further note, though: It is possible for CouchDB 2.0’s changes feed to return updates from before your since parameter so it is important to be idempotent when you process the rows you receive (the replication protocol is a good example of this). The reason for these so-called ‘rewinds’ is the case when CouchDB cannot contact the specific copy of a shard included in the update sequence and must replace it with another copy. Since the updates to the multiple copies of a shard are not coordinated, and hence are not in the same order, CouchDB finds the most recent checkpoint between the two copies and rewinds you there. This is typically a small amount but it depends strongly on update rate.

This has been a necessarily brief overview of the 2.0 architecture but we hope it covers enough ground to pique your interest. Oh, and relax.

 

Robert Newson is an engineer at Cloudant and an Apache CouchDB PMC member.


You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/. Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.

CouchDB Weekly News, July 28, 2016

Major Discussions

Starting 2.0 Release Candidates (see thread)

We are now at a point where we can start the CouchDB 2.0 release candidate phase, see the thread for more information on how you can help with testing the tarball.

Releases in the CouchDB Universe

PouchDB

Opinions and other News in the CouchDB Universe

… and in the PouchDB Universe

CouchDB Use Cases, Questions and Answers

Stack Overflow:

no public answer yet:

PouchDB Use Cases, Questions and Answers

Stack Overflow:

no public answer yet:

For more new questions and answers about CouchDB, see these search results and about PouchDB, see these.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help us with the work on the new CouchDB website? Get in touch on our new website mailing list and join the website team! – www@couchdb.apache.org
  • The CouchDB advocate marketing programme is just getting started. Join us in CouchDB’s Advocate Hub!
  • CouchDB has a new wiki. Help us move content from the old to the new one!
  • Can you help with Web Design, Development or UX for our Admin Console? No Erlang skills required! – Get in touch with us.
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to welcome you on board!

Events

Job opportunities for people with CouchDB skills

Time to relax! Something like a phenomena edition

  • “Summer is usually one of the most anticipated seasons of the year. After all, there is no better time to enjoy the outdoors and the sunshine. But summer also brings about a number of changes to our planet, from warming temperatures to dangerous thunderstorms to fireflies that flash in unison.” – These 10 natural phenomena happen every summer on our planet
  • “Over the weekend that just passed, US astronomers observed medium intensity solar eruptions on the sun, according to Descopera and the Dailymail. However, the phenomena investigated by the specialists from NASA’s Solar Dynamics Observatory are the most spectacular events of this type discovered this year. The events were classified as the most intense solar eruptions.” – Spectacular phenomena on the surface of the sun.
  • “It’s easy to get lost in cars and offices and grocery stores and forget that there’s a bigger, more beautiful world we don’t always get to see. But there’s stunning stuff happening every day, in some cases right outside your door. So let’s take a whirl through some of the most incredible, sometimes mind-boggling phenomena the Earth has to offer — along with a little of the science behind them.” – 25 of the coolest and most surreal natural phenomena on Earth
  • “Even some of the more mysterious phenomena have underlying explanations that are well-understood. We don’t know how there got to be more matter than antimatter in the Universe, but we know that the conditions we need for it — baryon number violation, out of equilibrium conditions and C and CP-violation — all exist. We don’t know what the nature of dark matter is, but its generic properties, where it’s located and how it clumps together is well-understood.” – Could Dark Energy Be Caused By A Reaction To What’s In The Universe?
  • “The phenomenon is known as the Aurora Borealis in the Northern Hemisphere and Aurora Australis in the Southern Hemisphere. The new timelapse footage also prompted fellow astronaut Scott Kelly — who spent a year in space aboard the ISS — to share his own memories overnight on Twitter of the aurora lights from space.” – Aurora lights captured in International Space Station timelapse

… and also in the news