Creating a CouchDB Search Engine: Tobias Gesellchen Interview Part 1

Tobias Gesellchen talked to us about how he first discovered CouchDB and what his experience has been employing it in applications. This is the first of two use cases that Gesellchen will be sharing.

How did you hear about CouchDB, and why did you choose to use it?

I don’t remember when or where I first heard about CouchDB, but I remember many examples for little blogs, note taking apps or other so-called CouchApps. I liked the idea of a minimal setup and infrastructure.

In the end I didn’t like the workflow when maintaining CouchApps, but the ease of use via the built-in HTTP API along with sensible defaults made me stick to CouchDB rather than other database systems like MySQL. I didn’t care so much about the NoSQL movement, but more about simplicity. MongoDB didn’t even exist at that time.

Did you have a specific problem that CouchDB solved?

When I started to use CouchDB I had just graduated from my computer science studies, so I didn’t have an actual use case or problem to solve. In the beginning, I only toyed with a private movie library.

The first real-life use case was a search engine for upcoming media topics, which I’ve maintained as side project for several years without major issues.

For the folks who are unsure of how they could use CouchDB–because there are a lot of databases out there—could you explain the use case?

Similar to a typical search engine one can find past and upcoming media (print and online titles) by their topics. The user can perform a full text query, but also find documents by more specific properties (e.g. publishing dates).

Everything is synced to Elasticsearch since CouchDB lacks full-text search. Apart from the media data, CouchDB also contains user accounts and other configuration. CouchDB serves as source of truth: from the user’s perspective the CouchDB is write-only to keep everything up to date, while Elasticsearch is read-only. The application ensures that writes to the database will also be written to the search index.

In other words: we follow CouchDB’s decision to focus on the A and P dimensions of the CAP theorem.

What would you say is the top benefit of using CouchDB?

With CouchDB’s simple and intuitive HTTP API, documents can be managed very easily. Even though the real application runs on a JVM/Spring Boot stack, it’s convenient to perform queries on the console only using curl – sometimes spiced with jq. In other words: a binary or more complex protocol would make debugging and development much harder.

What tools are you using in addition for your infrastructure? Have you discovered anything that pairs well with CouchDB?

For me, Elasticsearch is the most important supplement to CouchDB: it also uses a plain HTTP API and json documents, so it becomes very easy to keep both systems in sync. Other tools worth mentioning: Ansible for provisioning and deployment (Python modules are easy to adapt) and Docker for process encapsulation.

Although I first relied on the popular Ektorp as Java library, I switched to a simplistic adoption layer written in less than 500 lines of Groovy code.

What are your future plans with your project? Any cool plans or developments you want to promote?

The described project is a straight forward use case without scalability issues and only slow growth. On infrastructure level there won’t be much happening in the near future. In fact, I’m very happy with the setup so that there’s no need to change anything. CouchDB simply does its job and requires minimal maintenance. The only planned upgrade is from the currently running version 1.x to a more recent 2.x release.

That said: such a simple use case helps with learning CouchDB’s behavior and new features (e.g. Mango queries or the clustered setup). My great experience with CouchDB meant I trusted it for a more advanced and challenging project at my main employer Europace, but that’s another story.

If you have any questions about this use case, or if you just want to chat, you can get in touch with Tobias on Twitter @gesellix.

Use cases are a great avenue for sharing useful technical information. Please consider joining the fun! Additionally, if there’s something you’d like to see covered on the CouchDB blog, we would love to accommodate. Email us!

For more about CouchDB visit couchdb.org or follow us on Twitter at @couchdb

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s