CouchDB Weekly News, July 20, 2017

Releases

Releases in the CouchDB Universe

PouchDB

Opinions and other News in the CouchDB Universe

CouchDB Use Cases, Questions and Answers

Use Case:

  • time2relax: a Python CouchDB driver (requests under the hood)

Stack Overflow:

no public answer yet:

PouchDB Use Cases, Questions and Answers

Stack Overflow:

no public answer yet:

For more new questions and answers about CouchDB, see these search results and about PouchDB, see these.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Do you want to help us with the work on the new CouchDB website? Get in touch on our new website mailing list and join the website team! – www@couchdb.apache.org
  • The CouchDB advocate marketing programme is just getting started. Join us in CouchDB’s Advocate Hub!
  • CouchDB has a new wiki. Help us move content from the old to the new one!
  • Can you help with Web Design, Development or UX for our Admin Console? No Erlang skills required! – Get in touch with us.
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to welcome you on board!

Events

Job opportunities for people with CouchDB skills

Time to relax!

  • “When shooting a documentary, the vast majority of what you film gets edited out of the final production. But instead of letting thousands of hours of breathtaking footage go to waste, the creators of Planet Earth II have re-purposed hundreds of breathtaking mountaintop shots into this 10-hour-long relaxation tool.” – Chill Out With 10 Full Hours of Relaxing Planet Earth Mountain Footage
  • “As adults, our summers tend to be no different from any other potentially stressful time of the year. That doesn’t mean there aren’t still perks of the hot season, though.” – Summer’s best ways to de-stress (that aren’t a vacation)
  • “A recent study from the University of Sydney found that women who ate five to seven daily servings of fruits and vegetables slashed their risk for stress by 23 percent compared with those who ate fewer or none.” – These Surprising Foods Could Help You De-Stress
  • “It’s important to establish what’s going on, to work out their current situation, and make sure they’re getting what they need. If their mental health issues are going to impact their work longterm, you, as a boss, needs to know about it – and you need to create a plan together on how you’re going to work around it.” – How bosses can make their workplace more mental health friendly

… and also in the news

Submit news to the CouchDB Weekly

Reach out to us with your news suggestions by sending us an email or by contacting us on Twitter @CouchDB.

A CouchDB User Story: chatting with Assaf Morami

In our interview with Assaf Morami, he talked to us about how his usage of CouchDB for an internal project for his organization’s intranet. Assaf’s challenge was unique in that his project could not use clustering effectively as it had to be entirely in one machine.

Assaf’s machine supported nearly 6TB with around 2 billion documents across ~20 DBs, serving right around 100k reads/day and 20-50GB writes/day. This led him to “debug the hell out of it” resulting in this document: Linux tuning for better CouchDB performance.

Assaf went on the tell us more of why he chose to use CouchDB and how it has best helped support his project’s needs.

How did you hear about CouchDB, and why did you choose to use it?

I initially encountered CouchDB on Google.

I had inherited a project that was using Apache SOLR as its main database, but back then (April 2016) it had about 100GB of data, so all was well. The only person with write access to the database was me, so all we needed from SOLR is to be very quick while reading, and it was.

But then, I got 1.2TB of zipped, highly nested, schemaless JSONs to index. SOLR has this neat feature: “Schemaless Mode” which basically just creates an index (=schema entry) for each new field it discovers.

I had to use this mode because all fields with a value of sha1 string had to be fast to query, and the field names were randomly generated (weird, I know).

Because the field names were random, SOLR would create new schema entries all the time, which led it to be extremely slow and unstable.

SOLR would also flatten the input JSONs (e.g. {"a":{"b":1}} => {"a.b":1}) which was very annoying for us. After a couple of weeks and not a lot of GBs indexed, we experienced a big power outage. SOLR took 5 days to recover from this incident (checksum on init? data recovery?), so our systems wasn’t operational for that time span. This was UNACCEPTABLE!

I started googling for a schemaless DB that could support deeply nested JSONs. I ruled out MongoDB because of bad past experience, very slow queries on a 10GB collection with indexes. I also ruled out Elasticsearch because of Lucene. I figured Lucene’s many files and file edits is what caused the long recovery time after the power outage.

I specifically googled “schemaless db” and “mongodb vs”, it was here that I came across CouchDB.

I started reading the documentation and it got me hooked on the “just relax”, “there is no turn off switch, just kill the process” and the ability to build indexes programmatically, so I could recurse into the objects and emit values that match the sha1 regex.

What would you say is the top benefit of using CouchDB?

Durability. Since the SOLR saga, we’ve experienced a few more power outages, hard disk failures and filesystem corruptions (at least 2 of each; yeah, our infrastructure can be better).

Amongst all the panic and horror, I was smiling.

After power outages CouchDB has a zero recovery time. If a hard disk had died or the filesystem got corrupted, CouchDB would just reacquire the lost data by synchronizing from a replica or replicating a backup.

What tools are you using in addition for your infrastructure? Have you discovered anything that pairs well with CouchDB?

  • couchimport
  • obviously.
  • jq.
  • curl.

What are your future plans with your project? Any cool plans or developments you want to promote?

Yes, I have found a neat trick to import an archive full of JSON files.

I also plan to add a section about client http keep-alives, to my document detailing my results for seeking better CouchDB performance on Linux systems. I’ve found out that using HTTP keep-alives to access CouchDB can drastically improve CouchDB’s performance, as it doesn’t need to build and destroy TCP connections between interactions with clients. For example, while using Node.js’ request or request-promise package we’d turn on "require ('http').globalAgent.keepAlive = true" and pass "forever: true" with each request.

 

Use cases are a great avenue for sharing useful technical information, let us know how you use CouchDB! Additionally, if there’s something you’d like to see covered on the CouchDB blog, we would love to accommodate. Email us!

For more about CouchDB visit couchdb.org or follow us on Twitter at @couchdb