Longtime fans of the CouchDB project will probably recall the name Paul Davis from the work of merging BigCouch into the Apache CouchDB project a few years back in 2013. They might also recall the epic pre-tweetstorm era tweets documenting his and Rob Newson’s progress, thoughtfully captured here for all to enjoy. Paul has been with the project for a very long time, and has served on the Project Management Committee (PMC) for over 5 years.
He recently shared some of his thoughts and experiences about working on CouchDB.
Do you want to talk about your background, or how you got involved in CouchDB?
I found Apache CouchDB while working as a bioinformatician at New England Biolabs. My favorite saying back then was that in biology the only rule without exception is that there’s an exception to every rule (Ribosomal slippage, I’m looking at you!). I learned SQL from a couple very smart DBA programmers during school so I had a pretty good handle on the relational model. But during my years in bioinformatics each time I designed a database schema it was a trade off between simplicity (and thus not comprehensively representing a dataset), or complexity (to the point that it was difficult and slow to use for the 90% of the data). Luckily somewhere around the summer of 2008 I stumbled across this new wave of database technologies that came to be known collectively as “NoSQL.” It was a bit of a zen moment when I realized that maybe not having a schema is the solution to my constant rage against the database moments.
I ended up finding CouchDB through Hacker News and decided that the community was a wonderful assortment of characters. While I applied a lot of ideas from NoSQL and CouchDB to bioinformatics I never did end up using it directly for my work. However I found the project and the community so fun to be a part of that I ended up continuing to contribute and was eventually elected as a committer. Over time I was seduced by the startup world and left bioinformatics to join Cloudant
in 2011 as employee 11, or so. Eventually Cloudant was acquired by IBM where I continue to work on CouchDB.
What areas of the project do you work on?
Over time I’ve made contributions to nearly every part of Apache CouchDB, but by far and away my largest contributions are all focused on the database core and all of its related nooks and crannies.
Could you talk more about what you’re currently working on in database core?
My big project right now is the pluggable storage engine API. The main work of this change was to go through and define an abstraction layer that was both high level enough that it allowed for new and interesting storage engines, yet low level enough that a storage engine didn’t have to reimplement a large swath of the database logic. Once the basic API was designed it turned mostly into a matter of moving
a decent amount of legacy code around to fit into the new API and then writing a reusable test suite for storage engine behaviors that can be reused by any storage engine.
As discussed, this work is currently being reviewed so I’m mostly responding to comments and fixing a few minor bugs that were discovered, as well as improving the internal documentation for others to experiment writing their own storage engines. I’ve already seen developers get an initial implementation of an ETS-based ephemeral storage engine, along with a RocksDB-based storage engine, so it’s been pretty neat to see people experimenting with it before it’s even been merged.
What’s a recent development/event/aspect of the project that you’re excited about?
There’s currently a pull request open for adding a pluggable storage engine API to CouchDB. So far the reviews are all positive, as developers have started reviewing and testing it. Once that lands I think there will be a lot of fun experimentation and new storage backends for new types of deployments. It’ll be quite a lot of fun to see what people come up with there.
What do you think are the top three benefits of using CouchDB as a database solution?
- No schema! The relational model is great for relational data, but not all structured data fits that model.
- Application data model consistency. Most people think of this as replication but one of the things that I’ve always liked about CouchDB is how an application can reuse the same data model regardless of programming language or deployment environment.
- Erlang is a wonderful language for operations. The ability to open up a shell on a node that’s misbehaving to diagnose bugs and misbehaviors is invaluable.
What do you look forward to in the future of CouchDB?
In the short term there are a few features that could prove to be pretty exciting (pluggable storage engines and clustered purge). However, the one thing that keeps still keeps me going is the community around the project. Meeting new people as they join and keeping up with old friends after they’ve moved on is still one of the best parts about working on CouchDB.
What advice do you have for someone who just discovered CouchDB?
Come say hello on IRC or Slack! Also, probably my most common piece of advice I have for people learning CouchDB is that if it seems complicated, you’re over thinking it. I can’t count the number of times I’ve seen someone have an “Aha!” moment and then say, “I can’t believe it’s that simple!”
Have a suggestion on what you’d like to hear about next on the CouchDB blog? Email us!