The Road to CouchDB 3.0: Smarter Compaction Daemon

This is a post in a series about the Apache CouchDB 3.0 release. Check out the other posts in this series.

Much like automatic view index building, database compaction is a maintenance task that needs to run periodically. Unlike automatic view index building, CouchDB has included a Compaction Daemon since 1.x, which saw significant improvements in 2.x.

For 3.x, we are taking this to a whole new level.

The 1.x/2.x compaction daemon works in the most primitive way:

for each database/shard:

  start compaction

  for each design document in database:

    start compaction

That is it. There are some configuration options about when compactions can run, but with the introduction of ioq in 2.x. those have mostly been obsolete.

In CouchDB 3.0, we are introducing Smoosh, again a contribution from Cloudant.

Smoosh lets you define channels for different operations in CouchDB, some databases and view indexes might need compaction more eagerly than others, and you can now configure this every which way you’d need to slice it.

This ensures that your operational CouchDB experience should be a lot smoother.
See the documentation for all available options.

The Road to CouchDB 3.0: Automatic View Index Warming

This is a post in a series about the Apache CouchDB 3.0 release. Check out the other posts in this series.

Querying in CouchDB has always been a little different than in other databases. One such aspect is index creation and updating. In most other databases, an index is usually created upon definition, and updated when new data arrives. In CouchDB, when making a query against an index, that index is created and updated on demand at query time.

The underlying reason is a performance trade-off: in other databases, you are encouraged to have as many indexes as you need, but no more, because each additional index makes inserting and updating any data more resource intensive. In CouchDB on the other hand, you can have as many indexes as you like, only the ones that are actually used are built at query time.

The trade-off in particular is the following: updating many database updates at once is a lot more efficient than doing it one-by-one. However, if an index has not been used in a while, it can take quite some time to process all updates that have happened in the meantime. So the trade-off is data insertion speed for query latency.

Over the lifetime of CouchDB 1.x and 2.x users have built their own little cron jobs that periodically query all indexes to make sure each real query has at most a little database-update gap to bridge, making query responses more predictable.

CouchDB 3.0 introduces Ken, an automatic background indexing service that does this for you. No need to keep maintaining those view update cron jobs. Ken has been on duty at Cloudant for a long time, and is finally available in Open Source CouchDB as of 3.0
See the documentation for more details.