Feature: Compaction

This is the sixth in a series of blog posts introducing the Apache CouchDB 2.0 release. Read parts one, twothreefour and five in the series.

One way CouchDB averts data corruption is by only updating database files via append operations, never mutating existing data. While this method has numerous advantages, it tends to use a lot of disk space relative to a more traditional update-in-place DBMS. With every change to a database — be it insertion of new documents, update, or deletion — CouchDB internal b-trees and headers also need to be partially updated to incorporate any changes. These updates are added at the end of the database file, so CouchDB files can grow very quickly, and can contain a lot of unreferenced data, AKA “garbage.”

To free the system from this “garbage,” CouchDB uses a process called Compaction. Compaction works by copying the most recent revision of every document (while keeping some small metadata info of previous revisions) to a new compacted file, and should be run periodically to recover this wasted disk space. While it can be useful for databases where only new documents are inserted, it is especially beneficial for update-heavy databases where documents have many revisions.

IOQ

In previous versions of CouchDB, all database operations had equal priority in their access to I/O. Thus, longer running compaction tasks would have the same priority as latency-sensitive interactive requests from an application. Moreover, compaction tasks requiring a lot of I/O would noticeably impact the performance of interactive requests, resulting in significantly increased request latencies.

To prioritize different types of requests in their access to I/O, Cloudant developed an IOQ application, which has been added to CouchDB 2.0. Every database request requiring an I/O operation first goes through IOQ, and is put into one of two queues: one for interactive requests, another one for compaction requests. By default, ten I/O requests can be served concurrently; all other outstanding requests are put into the queues. A next request to be served is either chosen from the interactive queue, or from the compaction queue with the ratio of 100:1 (default ratio). This allows to prioritize concurrent interactive requests, and substantially lessen the impact of compaction on them.

The ratio and concurrency parameters for ioq can be configured in the default.ini file.

[ioq]
ratio = 0.01
concurrency = 10

Size and speed optimizations

While CouchDB committer and Cloudant lead architect Paul Davis contends that compaction as a concept is “dead simple,” doing it in the most straightforward way can leave a lot of room for improvement. The basic process involves walking the database for all docs by order of their last update (seq_tree), and copying all related data to a new file, which ultimately replaces the original db file. At one time the id_tree (all docs by order of their ids) was written directly to the new file, but Adam Kocoloski observed that writing to the id_tree in the order of the seq_tree could cause excessive garbage to be generated since the id_tree writes would be out of order.

This observation ultimately resulted in a major optimization in which the id_tree is written to a temp (.compact.meta) file. At the end of compaction, that id_tree is copied back to the compacted (.compact.data) file in order, which can result in greatly reduced size and compaction time.

Overall, this technique works well, and has been used by Cloudant in production for several years. The one caveat we’ve found so far is that it’s sometimes possible to create temp files with the last header buried several GB from the end of the file. If the compaction is interrupted for some reason (like a reboot), when it resumes it needs to find the most recently written header, which can take a lot of time if it’s deeply buried. We are working on techniques to speed up the location of buried headers, but it may also be possible to improve the underlying algorithm to prevent headers from being buried too deeply in the first place.

Compaction is a shard operation

In CouchDB 2.0, compaction is a shard operation, as every shard is an individual CouchDB database. Cluster-wide, node-wide manual compaction through a single http request is not implemented, as compacting all shards of a db on all nodes at once would significantly impair the database’s performance even with controlling IOQ. Thus, a compaction task is left for admins, and should go through a backdoor port 5986.

Manual database compaction

An example of a http request for compacting a shard 00000000-1fffffff  of “test” db on the node1:

curl -H "Content-Type: application/json" -X POST \
http://localhost:15986/shards%2F00000000-1fffffff%2Ftest.1470075898/_compact

where “test.1470075898” is the name of couch file on this shard.

Manual view compaction

Similar to databases, views are also shard based, and view compaction operations and should be run on the backend port 5986.

For a view stored in design doc: “_design/app” on the shard 80000000-9fffffff of the database “test”, the request would be:

curl -H "Content-Type: application/json" -X POST

Currently, the view compaction feature is not fully implemented, and will only compact views for shards that contain a design doc. For example, an attempt to compact the view of the shard 00000000-1fffffff that doesn’t contain a design doc “_design/app”, will cause the following error:

curl -H "Content-Type: application/json" -X POST 
{"error":"not_found","reason":"missing"}

There is an open JIRA issue for this, and this will be fixed in the future.

Automatic compaction

Automatic compaction in CouchDB 2.0 works similarly with CouchdDB 1.6, using the same configurations.

For compacting views, the compaction daemon has the same problem as the manual compaction of views: it will only compact views on shards that contain design documents.

Jay Doane is a software developer at IBM working on Cloudant Local (the on-prem version of the database), and Cluster Elasticity.

Mayya Sharipova is a software developer at IBM Cloudant focusing on integrations of CouchDB database with Apache Lucene and Spark.


You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/. Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.

Release Candidates

This is the fifth in a series of blog posts introducing the Apache CouchDB 2.0 release. Read parts one, twothree, and four in the series. 

Today I’d like to talk to you about the CouchDB 2.0 Release Candidates (RCs). On a regular schedule, the CouchDB Project Management Committee (PMC) releases RCs. These releases represent the completion of years of work towards CouchDB 2.0, and deserve your full attention. Very shortly, the RC cycle will be done – and your opportunity to report any last-minute problems you encounter will close. Please help us release the best possible CouchDB 2.0 we can by testing these release candidates thoroughly.

IMPORTANT

To our valued CouchDB application and library developers: please, please run your software against each of the options below. We hope that there are minimal changes necessary to your application, but if necessary fix any issues in your software so that it is ready to go with CouchDB 2.0.

If you encounter any issues that break your application irrevocably, please report them to us. You can do so through our JIRA bug tracker, or if you have questions, contact us at the user mailing list or via text chat.

Testing an RC yourself

To try out an RC, you can install it as a single node (a la CouchDB 1.x), a 3-node development cluster, or in an n-node configuration. First, download and unpacking the relevant apache-couchdb-2.0.0-RC#.tar.gz package, You’ll then need to compile the source code (UNIX and OSX) by following the instructions in the INSTALL.UNIX.md file. If you have all of the prerequisites installed, this is as simple as:

$ ./configure
$ make release

On Windows, simply run the installer and follow the instructions. Be sure to install to a path with no spaces, such as C:\CouchDB.

1-node configuration

After building CouchDB, run rel/couchdb/bin/couchdb (or ensure the service is running on Microsoft Windows). This will start up a 1-node CouchDB instance that logs to stderr. You can then access the brand new Fauxton web interface at http://localhost:5984/_utils/index.html.

You will immediately want to create the _users database. Click on the Create Database button at the top-left and enter _users as the database name.

Finally, verify your installation by clicking on the Verify link on the left hand side of the screen, then click the green Verify Installation button. All items should result in a checkbox. If you encounter any failures, please report them (see bottom of this post).

3-node configuration

Each CouchDB RC ships with a script that runs a 3-node development cluster, optionally with an haproxy load balancing front end. To run this script, change to the RC directory and run:

$ dev/run -n 3 --with-admin-party-please [--with-haproxy --haproxy=/path/to/haproxy]

The script will start 3 nodes, one each at ports 15984, 25984 and 35984. If you add the --with-haproxy option, the haproxy load balancer will run at port 5984, load balancing requests across all 3 nodes.

Proceed to Fauxton with your web browser, at http://localhost:(1)5984/_utils/index.html and validate your install as with the 1-node install. There is no need to create the _users database with this script.

n-node configuration

CouchDB 2.0 can also be configured with an arbitrary number of nodes. You can use the dev/run script with the -n n option to launch any number of nodes. Alternately, you can install each node on a separate machine and configure the cluster yourself using the _cluster_setup endpoint. Though this is out of scope for this blog post, you can learn more by reading the source of the dev/run script and searching for the cluster_setup, cluster_setup_with_admin_party, enable_cluster, add_node and finish_cluster functions.

 

Joan Touzet is a committer and PMC member for Apache CouchDB, as well as the point of contact for the CouchDB Code of Conduct


You can download the latest release candidate from http://couchdb.apache.org/release-candidate/2.0/. Files with -RC in their name a special release candidate tags, and the files with the git hash in their name are builds off of every commit to CouchDB master.

We are inviting the community to thoroughly test their applications with CouchDB 2.0 release candidates. See the testing and setup instructions for more details.