PouchDB & CouchDB: An interview with Nolan Lawson

With so many databases out there to choose, it’s hard to know which one will work best with your project’s infrastructure. Nolan Lawson, software developer and core maintainer of the popular JavaScript database PouchDB, understands firsthand the importance of examining a database’s tradeoffs before implementing it into your stack. He recently offered us some of his database insights.

How did you hear about CouchDB, and why did you choose to use it?

In 2012 I was working for Health On the Net, which is a Geneva-based NGO focusing on healthcare-related tech. Mostly what we did was certify health websites as abiding by a specific ethical code, but we also built a lot of websites and apps for clients like the European Commission and Swiss organizations like Santé Romande.

One of these was a greenfield project called Khresmoi, where we had an opportunity to build a health-based search engine using our database of certified health web sites. The main architect of the project had already chosen the core technologies, but he had also accepted a job in the US, so I was his replacement. The project was built on Solr/Lucene, Perl, jQuery, and a weird database I had never heard of before called CouchDB.

I’m not really sure why he had chosen CouchDB, but it was extremely ill-suited for the project at hand. Essentially we were crawling websites and storing the entire content, along with some metadata, in CouchDB. We did this several times a day, and every time a page was updated, we simply overwrote the existing documents. We weren’t using CouchDB sync at all, and we weren’t checking to see if the content had changed before writing a new revision.

Since of course CouchDB is all about revisions, this meant that the size of the database kept blowing up. Our machines would get overloaded with tens of gigabytes of data. The original architect hadn’t foreseen any of these problems, so I had to learn from scratch what CouchDB was, and how to do things like “compaction” on a regular basis to keep the database from ballooning.

We also had a lot of partners in the Khresmoi project who were very interested in aggregated views on our metadata, so I also had to learn how to performantly execute map/reduce queries, and keep those from growing out of control as well. It was pretty sink-or-swim, and to be honest I really disliked CouchDB at first, and I was always looking for opportunities to replace it with something else.

By learning all the rough edges of CouchDB, though, I eventually gained an appreciation for what CouchDB was actually good at: sync. It also impressed upon me the importance of understanding the tradeoffs of a database before using it in a project.

Did you have a specific problem that CouchDB solved?

In my mind, CouchDB has two killer features: sync and HTTP. We weren’t using either one in this project. The Perl crawler stored webpage data in CouchDB, and CouchDB was never exposed to the frontend via HTTP; it was just ferried into a Solr search database. This was also in the days before attachments, so we were storing all content as base64 strings.

What CouchDB did do fairly well was that we could do map/reduce queries on the data and then send a simple, queryable URL to our partners so that they could work with the data. It was also easy to set up authentication so that, for instance, only those with a username and a password could read it, but they couldn’t write it. The downside was that the views took a long time to build up; usually a partner would request a view on the data, and I’d say, “Okay, it’ll be done after the weekend.”

For the folks who are unsure of how they could use CouchDB–because there are a lot of databases out there—could you explain the use case?

CouchDB’s superpower is sync. Sometimes I even try to explain it to people by saying, “CouchDB isn’t a database; it’s a sync engine.” It’s a way of efficiently transferring data from one place to another, while intelligently managing conflicts and revisions. It’s very similar to Git. When I make that analogy, the light bulb often goes off.

Where this often fails is that folks may have an existing datastore, and they just want some sync mechanism on top of that. For instance, they have a MySQL or a MongoDB database, and they want just want PouchDB to sync to that instead of syncing to CouchDB. The reason this doesn’t work, and which is often hard to grasp, is that those other databases don’t have a concept of revisions built-in. For instance, when you delete a row or an object, it’s just gone. In CouchDB, it keeps a tombstone around so that it can remember what was deleted.

The analogy I would give, for people who struggle to understand why they can’t just slap CouchDB replication on top of Mongo or MySQL, is that it’s like saying, “Hey, I love Git, and the Git client is really cool, but can I use it with my FTP server?” Obviously that doesn’t work – an FTP server is just a flat filesystem, with no concept of branches or revisions. It’s exactly the same with CouchDB.

What would you say are the top three benefits of using CouchDB?

Sync, reliability, and simplicity. As J. Chris Anderson has said, CouchDB doesn’t aim to be the Ferrari of databases; it wants to be the Honda accord of databases. (See my old blog post on the subject)

The append-only file format means that you can just kill -9 a running CouchDB process and your data is still recoverable. It never gets corrupted. Also the HTTP/REST interface is very easy to use; you can use something like curl or Postman to learn how it works. When I was learning CouchDB, I would often just put some sample data into a database using Futon, and then I’d play around with URL parameters until I understood how it was working.

What tools are you using in addition for your infrastructure? Have you discovered anything that pairs well with CouchDB?

Well, as a co-maintainer of PouchDB, I obviously have to plug PouchDB here. PouchDB makes it trivially easy to sync between CouchDB on the server and IndexedDB, WebSQL, or LevelDB on the client. A lot of this can be credited to how well-thought-out CouchDB is as a whole.

There are other tools I find useful, though, like Postman which is a neat tool for debugging HTTP APIs. I’ve also written a tool called pouchdb-dump-cli which can be used to “dump” an entire CouchDB or PouchDB database to a text file, which can then be loaded back using pouchdb-load. Of course the classic backup tool for CouchDB is called cp (i.e., just copy the .couch file), but pouchdb-dump/pouchdb-load can be nice for portability and to make it easy to inspect the full contents of a database.

What are your future plans with your project? Any cool plans or developments you want to promote?

Absolutely, we’ve got a lot of work going in to PouchDB at the moment. Future improvements we plan to make are:

  • Greater customizability, reduce the size of the core JavaScript package for those who don’t need polyfills, legacy support, niche features, etc.
  • A more performant secondary index system
  • The purge API, which is the major piece of CouchDB functionality that is still unsupported by PouchDB
  • Faster replication – there are still some low-hanging fruit in the replication algorithm where we can optimize the back-and-forth and speed up replication

For more about CouchDB visit couchdb.org or follow us on Twitter at @couchdb. To learn more about PouchDB, visit pouchdb.com, or follow the official project Twitter account, @pouchdb

Have a suggestion on what you’d like to hear about next on the CouchDB blog? Email us!

CouchDB Weekly News, March 30, 2017

Major Discussions

Stabilizing our automated builds – help needed! (see thread)

The team working on improving the CI workflows had their first all-platform Jenkins pass on Monday! All 12 (well, 10, 2 combinations are skipped) test platforms succeeded. But more help is needed, especially from Erlang folks.

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

… and in the PouchDB Universe

CouchDB Use Cases, Questions and Answers

Stack Overflow:

no public answer yet:

PouchDB Use Cases, Questions and Answers

Stack Overflow:

no public answer yet:

For more new questions and answers about CouchDB, see these search results and about PouchDB, see these.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help us with the work on the new CouchDB website? Get in touch on our new website mailing list and join the website team! – www@couchdb.apache.org
  • The CouchDB advocate marketing programme is just getting started. Join us in CouchDB’s Advocate Hub!
  • CouchDB has a new wiki. Help us move content from the old to the new one!
  • Can you help with Web Design, Development or UX for our Admin Console? No Erlang skills required! – Get in touch with us.
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to welcome you on board!

Events

Job opportunities for people with CouchDB skills

Time to relax!

  • “In it, a claymation person with a bulbous, almost baboon-like ass lolls in a seductive moment, whispering soft affirmations directly to you. They lovingly sketch your face through the computer screen, glancing upward and making the sort of direct eye contact that forces you to at first look away, then gaze back, deeply. You feel known by this stranger.” – This short, creepy video is pure, undiluted internet weirdness
  • “Anxiety is something we all experience, to a greater or lesser extent. Whether it’s as stressful as a job interview or relationship break-up, or as everyday as arranging a delivery or sorting out a phone bill, it’s easy to let worries eat away at us. These quotes, from writers, thinkers and visionaries – might help take your heart rate down.” – 15 calming quotes to put everyday worries in perspective
  • “Aromatherapy is as old as nature itself, but humans have been using the art and science of aromatherapy therapeutically for at least 6000 years. There is plenty of archaeological evidence to suggest that aromatherapy oils were regularly used in the ancient temples of Egypt, Greece and Rome. Our ancient ancestors must have observed that the scents of flowers, trees and other plants had an impact on their stress levels, anxiety, sleep, mood, pain and more.” – Aromatherapy for Beginners: Scents to Uplift, Balance and Calm
  • “The trouble starts in elementary school. Most very young kids are horizontal breathers, says Vranich. When they inhale, it looks like there’s a balloon in their bellies — air enters and expands the biggest part of their lungs. But once in the classroom, they pick up the bad posture that comes with sitting all day. And slumping crushes your diaphragm muscle and blocks the lower lungs from expanding.” – 3 Breathing Techniques That Will Help You Feel Less Stressed ASAP
  • “The idea of producing ambient melodies by slowing down a piece of music is old hat by this point. Remember how Justin Bieber’s “Baby” sounded like Sigur Rós when slowed down by 800 percent? Well, whether it be the Windows 95, 98, 2000, or XP theme, the results all sound like the sunnier sides of a Stars Of The Lid or Dead Texan drone. That said, check out the shadow that blankets the Windows 2000 version around the two-minute mark. This thing’s got layers.” – Let’s listen to some remarkably soothing, slowed-down Windows start-up sounds

… and also in the news