Feature: Replication

This is the seventh in a series of blog posts introducing the Apache CouchDB 2.0 release. Read parts one, twothreefourfive, and six in the series.

Replication is one of the central features of CouchDB. In CouchDB 2.0, replication takes advantage of clustering to achieve scalability and high availability. Some configuration defaults have changed, some aspects work a bit differently, there were many bug fixes, performance improvements, and of course, a set of exciting new features were added.

Replicating in a Cluster

Just like in CouchDB 1.x, there are still two ways to start replications:  one is to write a document in a “_replicator” database, which will create a persistent replication, the other is via an HTTP request to the  “_replicate” endpoint. The former is the preferred way, as the replication tasks will persist if the cluster restarts and the other doesn’t.

In either case CouchDB 2.0 makes sure the task is running on only one node in the cluster. In case of persistent replications it runs on the node where the first shard of the replication document is located. This is a nice performance optimization – if the document is updated, only a node local change feed is needed to notify replicator code of the update. In case of a replication posted to “_replicate” endpoint,  the task is assigned to a cluster node based on a hash of source and target parameters. In both cases replication tasks should be uniformly distributed across the cluster, and with each newly added node users will see a performance improvement.

When cluster configuration changes, for example, because nodes are added or removed, placement of persistent replication tasks is re-evaluated and some replications might end up running on a different node. This is done automatically and is transparent to the user. However, replications created via the “_replicate” endpoint stay running where they have initially started, and are not moved to new nodes; this goes along with their transient nature.

Remote vs. Local

An interesting aspect related to replications in a cluster is how sources and targets are handled. In CouchDB 1.x both “local” and “remote” versions of sources and targets would be useful. Local ones are specified by using just the database name, and it refers to a database local to the server. Remote ones use a full URL to refer to the database. This could be a database on the same server or a database in another part of the world. Because of clustering in 2.0, a “local” database has different semantics – it means a database which is not clustered and lives only on the current node where the replication task is running. These databases are usually accessed via the node local API endpoint (default port 5986) and mostly likely are not what users would want to access directly. In other words, in CouchDB 2.0 in most cases it is better to use full URLs when specifying targets and sources, even if referring to databases on same cluster.

Multiple Replicator Databases

One of the configuration changes in 2.0 means that it’s no longer possible to change the name of the replicator database, it is always “_replicator”. On the other hand, it is now possible to have multiple replicator databases. Any database which ends with “/_replicator” suffix will be considered a replicator database, and will be monitored and processed accordingly just like the main “_replicator”. This allows greater flexibility, for example, by having a temporary db called  “dev/_replicator” used for testing or others experiments. When finished with it, one can just delete “dev/_replicator” and all those replications will be canceled and cleaned up from the system.

Another configuration change is the default checkpoint interval for replications went up from 5 seconds to 30 seconds. As replications make progress they periodically write checkpoints to both target and source databases. In 1.0 this happens every 5 seconds by default. In 2.0, because a cluster will usually run a larger number of replications, this default has been increased to 30 seconds. This is just a default, and the setting is configurable via the “checkpoint_interval” parameter.

A New Way to Filter

An exciting new feature in 2.0 is the ability to use Mango selectors for filtering. This allows for more consistent and efficient filtering of documents, compared to the traditional 1.x replication filters, which are written in JavaScript. To use this capability just add a “selector” field to the replication document with the Mango query selector as the value.

For example:

{
"_id": "r",
"continuous": true,
"selector": {
  "_id": {
    "$gte": "2"
  }
},
"source": "http://adm:pass@localhost:15984/a",
"target": "http://adm:pass@localhost:15984/b"
}

replicates only documents with ids greater or equal to “2”. Of course, JavaScript-based filters continue to be supported.

Replication Everywhere

Saving the best for last, perhaps the nicest “feature” is the base replication protocol has not changed. It is possible to replicate between a CouchDB 2.0 cluster and CouchDB 1.x instances. One can still use a variety of custom replication topologies for which CouchDB is known for: push, pull, and bidirectional replications, and of course continue to replicate with our in-browser sister-project, PouchDB. Also what used to be single machine node can now be replaced by a fault tolerant and scalable cluster.

Nick Vatamaniuc is a software engineer at Cloudant and an Apache CouchDB committer.


You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/. Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.

CouchDB Weekly News, August 11, 2016

Releases

Apache CouchDB 2.0 Release Candidate 3:

You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/. Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.

Major Discussions

[PROPOSAL] CouchDB 2.0 log to ./var/log/couchdb.log by default (see thread)

Joan Touzet opened a PR to correct 2.0 logging only to stderr, but is requesting feedback from stakeholders.

Releases in the CouchDB Universe

PouchDB

Opinions and other News in the CouchDB Universe

… and in the PouchDB Universe

CouchDB Use Cases, Questions and Answers

Stack Overflow:

no public answer yet:

PouchDB Use Cases, Questions and Answers

Use Case:

  • Agent 008, Decoupled Offline Drupal 8 using PouchDB and React

Stack Overflow:

no public answer yet:

For more new questions and answers about CouchDB, see these search results and about PouchDB, see these.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help us with the work on the new CouchDB website? Get in touch on our new website mailing list and join the website team! – www@couchdb.apache.org
  • The CouchDB advocate marketing programme is just getting started. Join us in CouchDB’s Advocate Hub!
  • CouchDB has a new wiki. Help us move content from the old to the new one!
  • Can you help with Web Design, Development or UX for our Admin Console? No Erlang skills required! – Get in touch with us.
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to welcome you on board!

Events

Job opportunities for people with CouchDB skills

Time to relax! With the Olympics

  • “De Lima was leading the Olympic marathon in 2004 when he was attacked by a protestor near the end of the race. He ended up finishing third, but the graceful way he handled the disappointment won him plaudits around the world for his sportsmanship.” – What a shot! 40 amazing photos from the Olympics
  • “Look, I’m an unabashed lover of the Olympics. They are a delight to watch and enjoy. Relax. Have some fun. Because the Olympics are great.” – Don’t be a curmudgeon. The Olympics are awesome.
  • “And the 22-year-old, who is called ‘grandma’ by her teammates, had one of the most unforgettable performances of the day. One six-second clip of the opening to her ‘insane’ floor routine has been retweeted more than 18,000 times.” – Aly Raisman’s Amazing Olympics Floor Routine is Going Viral
  • “Ellis said he never asked Willock to help – the pair has simply gotten into a conversation about his son was competing.  Willock asked Hill if he’d attend if he had a ticket and the crowd funding idea took off from there, despite the two being strangers. Willock was also able to use her work connections; she works for a travel company, to help arrange the surprise trip.” – This Woman Sent Her Uber Driver To Rio After Discovering His Amazing Olympic Connection
  • “Just as the Olympics attract the world’s best athletes, the Games also lure many of the world’s greatest photographers to capture their superhuman feats. Their tools are many: multiple exposures, robotic underwater cameras, remotes hanging from rafters, and good old-fashioned sideline shooting. The result is a daily flood of thousands upon thousands of images.” – The Most Spectacular Moments from the Rio Olympics So Far

… and also in the news