Drupal ♥ CouchDB

For most web sites and software there are three pillars, code, configuration, and content. Code is always the easy one, stick it in git, then deploy it anywhere with full revisions and accountability. Historically Drupal has struggled with the other two aspects, until Drupal 8, which was launched in late 2015. This added configuration management, stored in yaml files the configuration could now be version controlled alongside the code. Now all that’s left to be solved is content.

Previously in Drupal there have been two main options. Add content in production, or use the Deploy module, which implemented a custom REST API to copy content to another Drupal site. Since it was a custom REST API it wasn’t very well documented, issues were often found and it was difficult to integrate with. The most common suggestion was to just add content in production, this worked for many, however if there were many legal or editorial steps then this was not an option.

During the development of Drupal 8, Dick Olsson, the maintainer of the Deploy module, started to look for a better solution. After a lot of research it seemed all of the content storage and replication problems had already been solved, in CouchDB. Along with Andrei Jechiu and Tim Millwood, Dick developed a suite of contributed modules for Drupal that follow the CouchDB Replication protocol.

Data storage
Drupal doesn’t store revisions for all content, and for the content that is revisionable this is not enforced. Therefore the first step was altering the schema for all content to allow us to store revisions for all content. There were also a bunch of other things we needed to store to be compatible with CouchDB, such as, a revision hash, a sequence ID, and a revision tree. Another issue we had is Drupal has many types of content, and modules can define new types of content, so without looping through all content types, then all content, we had no way of knowing which UUID related to which content. To solve this we created a UUID index in a simple key value store. Many of these features were developed into the Multiversion module.

API endpoints
Drupal 8 core ships with a REST module, which provides a bunch of API end points, we expanded on these to provide all endpoints compatible with the CouchDB protocol. This provided all the needed headers and document attributes required by the protocol. The Drupal content was then normalized (and denormalized) to provide valid JSON-LD documents. The Replication and RELAXed Web Services modules provide all the functionality to expose the CouchDB compatible API endpoints.

Replication
Initially the native CouchDB replicator was used to replicate from Drupal to CouchDB, CouchDB to Drupal, and Drupal to Drupal. Tests were then setup to work with PouchDB allowing Drupal to be used as a backend for decoupled web apps.

As part of Google Summer of Code 2015 a student worked on a PHP version of the CouchDB replicator, this was worked into the Drupal modules to allow for replication between CouchDB compatible endpoints via the Drupal UI without using CouchDB.

Conclusion
Drupal now has an awesome content staging solution allowing users to stage content on one site then replicate all content to a live site. It also allows for backing up, or storing content, within a CouchDB database. Then users can also decouple their front-end by replicating content to PouchDB.

Further reading
Drupal Deploy – The team behind this work.
Multiversion – Used for Drupal storage updates.
RELAXed Web Services – Provides the API endpoints.
Deploy – The UI for replicating between sites.
PHP CouchDB Replicator – Replicate using PHP.
Improving Drupal’s content workflow

One thought on “Drupal ♥ CouchDB

  1. CouchDB Weekly News, March 31, 2016 – CouchDB Blog

Leave a comment