CouchDB Weekly News, August 11, 2016

Releases

Apache CouchDB 2.0 Release Candidate 3:

You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/. Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.

Major Discussions

[PROPOSAL] CouchDB 2.0 log to ./var/log/couchdb.log by default (see thread)

Joan Touzet opened a PR to correct 2.0 logging only to stderr, but is requesting feedback from stakeholders.

Releases in the CouchDB Universe

PouchDB

Opinions and other News in the CouchDB Universe

… and in the PouchDB Universe

CouchDB Use Cases, Questions and Answers

Stack Overflow:

no public answer yet:

PouchDB Use Cases, Questions and Answers

Use Case:

  • Agent 008, Decoupled Offline Drupal 8 using PouchDB and React

Stack Overflow:

no public answer yet:

For more new questions and answers about CouchDB, see these search results and about PouchDB, see these.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help us with the work on the new CouchDB website? Get in touch on our new website mailing list and join the website team! – www@couchdb.apache.org
  • The CouchDB advocate marketing programme is just getting started. Join us in CouchDB’s Advocate Hub!
  • CouchDB has a new wiki. Help us move content from the old to the new one!
  • Can you help with Web Design, Development or UX for our Admin Console? No Erlang skills required! – Get in touch with us.
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to welcome you on board!

Events

Job opportunities for people with CouchDB skills

Time to relax! With the Olympics

  • “De Lima was leading the Olympic marathon in 2004 when he was attacked by a protestor near the end of the race. He ended up finishing third, but the graceful way he handled the disappointment won him plaudits around the world for his sportsmanship.” – What a shot! 40 amazing photos from the Olympics
  • “Look, I’m an unabashed lover of the Olympics. They are a delight to watch and enjoy. Relax. Have some fun. Because the Olympics are great.” – Don’t be a curmudgeon. The Olympics are awesome.
  • “And the 22-year-old, who is called ‘grandma’ by her teammates, had one of the most unforgettable performances of the day. One six-second clip of the opening to her ‘insane’ floor routine has been retweeted more than 18,000 times.” – Aly Raisman’s Amazing Olympics Floor Routine is Going Viral
  • “Ellis said he never asked Willock to help – the pair has simply gotten into a conversation about his son was competing.  Willock asked Hill if he’d attend if he had a ticket and the crowd funding idea took off from there, despite the two being strangers. Willock was also able to use her work connections; she works for a travel company, to help arrange the surprise trip.” – This Woman Sent Her Uber Driver To Rio After Discovering His Amazing Olympic Connection
  • “Just as the Olympics attract the world’s best athletes, the Games also lure many of the world’s greatest photographers to capture their superhuman feats. Their tools are many: multiple exposures, robotic underwater cameras, remotes hanging from rafters, and good old-fashioned sideline shooting. The result is a daily flood of thousands upon thousands of images.” – The Most Spectacular Moments from the Rio Olympics So Far

… and also in the news

Feature: Compaction

This is the sixth in a series of blog posts introducing the Apache CouchDB 2.0 release. Read parts one, twothreefour and five in the series.

One way CouchDB averts data corruption is by only updating database files via append operations, never mutating existing data. While this method has numerous advantages, it tends to use a lot of disk space relative to a more traditional update-in-place DBMS. With every change to a database — be it insertion of new documents, update, or deletion — CouchDB internal b-trees and headers also need to be partially updated to incorporate any changes. These updates are added at the end of the database file, so CouchDB files can grow very quickly, and can contain a lot of unreferenced data, AKA “garbage.”

To free the system from this “garbage,” CouchDB uses a process called Compaction. Compaction works by copying the most recent revision of every document (while keeping some small metadata info of previous revisions) to a new compacted file, and should be run periodically to recover this wasted disk space. While it can be useful for databases where only new documents are inserted, it is especially beneficial for update-heavy databases where documents have many revisions.

IOQ

In previous versions of CouchDB, all database operations had equal priority in their access to I/O. Thus, longer running compaction tasks would have the same priority as latency-sensitive interactive requests from an application. Moreover, compaction tasks requiring a lot of I/O would noticeably impact the performance of interactive requests, resulting in significantly increased request latencies.

To prioritize different types of requests in their access to I/O, Cloudant developed an IOQ application, which has been added to CouchDB 2.0. Every database request requiring an I/O operation first goes through IOQ, and is put into one of two queues: one for interactive requests, another one for compaction requests. By default, ten I/O requests can be served concurrently; all other outstanding requests are put into the queues. A next request to be served is either chosen from the interactive queue, or from the compaction queue with the ratio of 100:1 (default ratio). This allows to prioritize concurrent interactive requests, and substantially lessen the impact of compaction on them.

The ratio and concurrency parameters for ioq can be configured in the default.ini file.

[ioq]
ratio = 0.01
concurrency = 10

Size and speed optimizations

While CouchDB committer and Cloudant lead architect Paul Davis contends that compaction as a concept is “dead simple,” doing it in the most straightforward way can leave a lot of room for improvement. The basic process involves walking the database for all docs by order of their last update (seq_tree), and copying all related data to a new file, which ultimately replaces the original db file. At one time the id_tree (all docs by order of their ids) was written directly to the new file, but Adam Kocoloski observed that writing to the id_tree in the order of the seq_tree could cause excessive garbage to be generated since the id_tree writes would be out of order.

This observation ultimately resulted in a major optimization in which the id_tree is written to a temp (.compact.meta) file. At the end of compaction, that id_tree is copied back to the compacted (.compact.data) file in order, which can result in greatly reduced size and compaction time.

Overall, this technique works well, and has been used by Cloudant in production for several years. The one caveat we’ve found so far is that it’s sometimes possible to create temp files with the last header buried several GB from the end of the file. If the compaction is interrupted for some reason (like a reboot), when it resumes it needs to find the most recently written header, which can take a lot of time if it’s deeply buried. We are working on techniques to speed up the location of buried headers, but it may also be possible to improve the underlying algorithm to prevent headers from being buried too deeply in the first place.

Compaction is a shard operation

In CouchDB 2.0, compaction is a shard operation, as every shard is an individual CouchDB database. Cluster-wide, node-wide manual compaction through a single http request is not implemented, as compacting all shards of a db on all nodes at once would significantly impair the database’s performance even with controlling IOQ. Thus, a compaction task is left for admins, and should go through a backdoor port 5986.

Manual database compaction

An example of a http request for compacting a shard 00000000-1fffffff  of “test” db on the node1:

curl -H "Content-Type: application/json" -X POST \
http://localhost:15986/shards%2F00000000-1fffffff%2Ftest.1470075898/_compact

where “test.1470075898” is the name of couch file on this shard.

Manual view compaction

Similar to databases, views are also shard based, and view compaction operations and should be run on the backend port 5986.

For a view stored in design doc: “_design/app” on the shard 80000000-9fffffff of the database “test”, the request would be:

curl -H "Content-Type: application/json" -X POST

Currently, the view compaction feature is not fully implemented, and will only compact views for shards that contain a design doc. For example, an attempt to compact the view of the shard 00000000-1fffffff that doesn’t contain a design doc “_design/app”, will cause the following error:

curl -H "Content-Type: application/json" -X POST 
{"error":"not_found","reason":"missing"}

There is an open JIRA issue for this, and this will be fixed in the future.

Automatic compaction

Automatic compaction in CouchDB 2.0 works similarly with CouchdDB 1.6, using the same configurations.

For compacting views, the compaction daemon has the same problem as the manual compaction of views: it will only compact views on shards that contain design documents.

Jay Doane is a software developer at IBM working on Cloudant Local (the on-prem version of the database), and Cluster Elasticity.

Mayya Sharipova is a software developer at IBM Cloudant focusing on integrations of CouchDB database with Apache Lucene and Spark.


You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/. Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.