CouchDB Developer Profile: Paul Davis

Paul Davis, CouchDB developer Longtime fans of the CouchDB project will probably recall the name Paul Davis from the work of merging BigCouch into the Apache CouchDB project a few years back in 2013. They might also recall the epic pre-tweetstorm era tweets documenting his and Rob Newson’s progress, thoughtfully captured here for all to enjoy. Paul has been with the project for a very long time, and has served on the Project Management Committee (PMC) for over 5 years.

He recently shared some of his thoughts and experiences about working on CouchDB.

Do you want to talk about your background, or how you got involved in CouchDB?

I found Apache CouchDB while working as a bioinformatician at New England Biolabs. My favorite saying back then was that in biology the only rule without exception is that there’s an exception to every rule (Ribosomal slippage, I’m looking at you!). I learned SQL from a couple very smart DBA programmers during school so I had a pretty good handle on the relational model. But during my years in bioinformatics each time I designed a database schema it was a trade off between simplicity (and thus not comprehensively representing a dataset), or complexity (to the point that it was difficult and slow to use for the 90% of the data). Luckily somewhere around the summer of 2008 I stumbled across this new wave of database technologies that came to be known collectively as “NoSQL.” It was a bit of a zen moment when I realized that maybe not having a schema is the solution to my constant rage against the database moments.

I ended up finding CouchDB through Hacker News and decided that the community was a wonderful assortment of characters. While I applied a lot of ideas from NoSQL and CouchDB to bioinformatics I never did end up using it directly for my work. However I found the project and the community so fun to be a part of that I ended up continuing to contribute and was eventually elected as a committer. Over time I was seduced by the startup world and left bioinformatics to join Cloudant
in 2011 as employee 11, or so. Eventually Cloudant was acquired by IBM where I continue to work on CouchDB.

What areas of the project do you work on?

Over time I’ve made contributions to nearly every part of Apache CouchDB, but by far and away my largest contributions are all focused on the database core and all of its related nooks and crannies.

Could you talk more about what you’re currently working on in database core?

My big project right now is the pluggable storage engine API. The main work of this change was to go through and define an abstraction layer that was both high level enough that it allowed for new and interesting storage engines, yet low level enough that a storage engine didn’t have to reimplement a large swath of the database logic. Once the basic API was designed it turned mostly into a matter of moving
a decent amount of legacy code around to fit into the new API and then writing a reusable test suite for storage engine behaviors that can be reused by any storage engine.

As discussed, this work is currently being reviewed so I’m mostly responding to comments and fixing a few minor bugs that were discovered, as well as improving the internal documentation for others to experiment writing their own storage engines. I’ve already seen developers get an initial implementation of an ETS-based ephemeral storage engine, along with a RocksDB-based storage engine, so it’s been pretty neat to see people experimenting with it before it’s even been merged.

What’s a recent development/event/aspect of the project that you’re excited about?

There’s currently a pull request open for adding a pluggable storage engine API to CouchDB. So far the reviews are all positive, as developers have started reviewing and testing it. Once that lands I think there will be a lot of fun experimentation and new storage backends for new types of deployments. It’ll be quite a lot of fun to see what people come up with there.

What do you think are the top three benefits of using CouchDB as a database solution?

  1. No schema! The relational model is great for relational data, but not all structured data fits that model.
  2. Application data model consistency. Most people think of this as replication but one of the things that I’ve always liked about CouchDB is how an application can reuse the same data model regardless of programming language or deployment environment.
  3. Erlang is a wonderful language for operations. The ability to open up a shell on a node that’s misbehaving to diagnose bugs and misbehaviors is invaluable.

What do you look forward to in the future of CouchDB?

In the short term there are a few features that could prove to be pretty exciting (pluggable storage engines and clustered purge). However, the one thing that keeps still keeps me going is the community around the project. Meeting new people as they join and keeping up with old friends after they’ve moved on is still one of the best parts about working on CouchDB.

What advice do you have for someone who just discovered CouchDB?

Come say hello on IRC or Slack! Also, probably my most common piece of advice I have for people learning CouchDB is that if it seems complicated, you’re over thinking it. I can’t count the number of times I’ve seen someone have an “Aha!” moment and then say, “I can’t believe it’s that simple!”

 

For more about CouchDB visit couchdb.org or follow us on Twitter at @couchdb

Have a suggestion on what you’d like to hear about next on the CouchDB blog? Email us!

CouchDB Developer Profile: Joan Touzet

 If you’re following the Apache CouchDB dev mailing list, then you’ve probably been seeing a lot of updates about testing recently. At the forefront of that effort is Joan Touzet. Joan is a long-time committer and PMC member for Apache CouchDB, as well as the point of contact for the CouchDB Code of Conduct. Both getting people excited about using CouchDB and making it easier for them to use it are two aspects of the project that Joan is very enthusiastic about seeing through.

Joan recently offered us some insights into the CouchDB project from her perspective.

Do you want to talk about your background, or how you got involved in CouchDB?

A fellow graduate student introduced me to CouchDB while I was working on systems for student work support and analysis. We used it to extract things students posted on class forum software and then ran various analysis over it, like latent semantic analysis.

I started using the replication feature early on to sync data between multiple servers and my workstation, which was super easy!

I started working at Cloudant shortly after that – I was employee #20 and did a number of things, like devops, development, support and field work. I left Cloudant about a year after they were bought out by IBM.

What areas of the project do you currently work on?

Since returning to active work on the project a few months ago, I’ve been focused on testing, packaging and project management work.

Our test suite has two parts: unit tests written in the Erlang eunit framework, and integration/API tests written in JavaScript. We run these tests regularly in two continuous integration systems: Travis and Jenkins. I’ve been following up on some intermittent failures in these tests.

Packaging is something that got left behind for our 2.0 release, but only in the interest of time. With community sponsorship, I’ve been able to release beta Debian, Ubuntu, CentOS and Windows packages for 2.0. I’m working now to automate this process so that packages can be built with each successful run of our test suite for our major release branches as well as the master branch.

In the future, I hope to extend this to other CouchDB community contributions, such as Cloudant’s full-text and geo search open source addons. Reducing the amount of effort it takes for us to put new releases out the door, and for people to use those releases, is a passion of mine.

What’s a recent development of the project that you’re exited about?

CouchDB was one of the first Apache projects to use git. We’re also now one of the first projects to leverage GitHub more actively than before, beta testing a new integration provided by the Apache Software Foundation’s infrastructure team.

The new integration allows us to manage pull requests and issues directly on GitHub, rather than separately through the traditional JIRA setup. I’ve been spearheading the effort to move onto GitHub issues. We finally went ‘live’ with it about 2 weeks ago.

On a daily basis I triage and curate filed issues. I’m really excited about how much easier this will make it for community members to interact with the developers!

What would you say are the top three benefits of using CouchDB as a database solution?

  1. Sync. It’s the “killer feature” of CouchDB. Whether you’re doing offline-first client development, running a clustered database or distributing data between various server installations, CouchDB’s master-master replication is better than anything I’ve ever used. It Just Works(tm).
  2. Ease-of-use. Every programming language has an HTTP library, and almost all have a JSON library, too. That’s all you need to be successful with CouchDB. Sure, there are language-specific CouchDB client libraries, but in general I don’t find them necessary. It’s just so easy to get going with CouchDB.
  3. Powerful and versatile secondary indexing capabilities. Yeah, it’s a mouthful, but you can do all sorts of interesting things. For simple indexing, we have the Mango library in 2.0. If you want to get more complex, you can write a JavaScript Map/Reduce function. And there are open source addons for full-text search and robust geo indexing, too. I’ve yet to find a kind of indexing that I’ve been unable to do with all these tools at my disposal.

What do you look forward to in the future of CouchDB?

The CouchDB PMC has put together a wishlist for the next few years of CouchDB that includes some great new features relating to our clustering ability (making it easier to scale and administrate), improving Mango’s support for additional search types (“joins”, reduces, bitwise operations, document validation), pluggable storage engines, selective sync, and a “mobile first” replication protocol leveraging HTTP/2.

We even have some ideas to make it easier to contribute to CouchDB, such as figuring out how we could support Elixir plugins. The future is very bright!

What advice do you have for someone who just discovered CouchDB?

  1. If you know SQL, don’t approach CouchDB like a database that you know. Step back from your “5 normal forms” and start afresh. Check out some of the community frameworks around CouchDB (like Hoodie), that make it easy to write applications.
  2. Hang out in our Slack or IRC channels to ask questions.
  3. Don’t be afraid to get your feet wet!

For more about CouchDB visit couchdb.org or follow us on Twitter at @couchdb

Have a suggestion on what you’d like to hear about next on the CouchDB blog? Email us!