Offline-First, Real-Time Tools for Small Businesses: How Hisab Uses CouchDB

We chatted with Chirag Moradiya to learn how CouchDB’s replication features — teemed with a custom Spring Boot integration — create the backbone for their real-time and offline-first app, Hisab. It’s a comprehensive tool for small and micro businesses that manages everything from inventory and sales to accounting, with the advantage of offline, real-time functionality. 
We learned about the architectural decisions that made CouchDB the clear database choice, and what else Chirag has paired with it.

Which of CouchDB’s distinct features have been essential to you achieving your goal(s)?

Replication and Map-Reduce Query.

How did you hear about CouchDB, and why did you choose to use it?

Real-time updates and local-first were two important considerations in the architecture. We didn’t find anything better than CouchDB + PouchDB. Firebase was there for the real-time data flow, but it was not local-first, so we chose CouchDB and have been using it for around 8 years now.

Did you have a specific problem that CouchDB solved?

Both local real-time and local-first were essential problems for us to solve.

For the folks who are unsure of how they could use CouchDB — because there are a lot of databases out there — could you explain the use case?

We use CouchDB as the main database for all the application data. In other applications we use Firestore, and earlier we used Firebase Real-time Database; CouchDB is better in terms of Performance, Scalability.

What would you say is the top benefit of using CouchDB?

Performance, as well as the unique and open replication/sync protocol.

What tools are you using in addition for your infrastructure? Have you discovered anything that pairs well with CouchDB?

1. Spring Boot Integration: We have developed a library to provide Spring Reactor APIs to interact with the CouchDB Server; which also provides repository style interfaces and abstract classes. This simplifies adding a new repository (for an entity).

2. DreamDB: It’s a middleware between CouchDB and PouchDB. It eliminates boiler-plate code to implement real-time and local-first query using CouchDB and PouchDB. It also provides a way to write documents into CouchDB in a local-first manner. DreamDB server is written in NodeJS and DreamDB Client runs on the Browser. Both communicate through a single WebSocket connection and facilitate browser communication with any number of databases — eliminating browser limits for the Max N connections. It works well with multiple browser tabs and multiple devices (mobile, desktop) of a user.

We are going to open-source both of these, so others can use them freely and contribute to add more features. But due to time constraint and having a very small team this is not done on time.

I will try working extra hours at nights and weekends to make this possible!

What are your future plans with your project? Any cool plans or developments you want to promote?

We are going to add a read-security feature to DreamDB, which is most needed in our use case.

Thank you, Chirag, for sharing your story with us and for your contributions. We’ll be following your plans and look forward to sharing your tools as they come out.

Use cases are a great avenue for sharing useful technical information; let us know how you use CouchDB! Additionally, if there’s something you’d like to see covered on the CouchDB blog, we would love to accommodate. Email us!

For more about CouchDB visit couchdb.apache.org or follow us on Mastodon: @couchdb@fosstodon.org 

A CouchDB User Story: chatting with Assaf

In our interview with Assaf, he talked to us about how his usage of CouchDB for an internal project for his organization’s intranet. Assaf’s challenge was unique in that his project could not use clustering effectively as it had to be entirely in one machine.

Assaf’s machine supported nearly 6TB with around 2 billion documents across ~20 DBs, serving right around 100k reads/day and 20-50GB writes/day. This led him to “debug the hell out of it” resulting in this document: Linux tuning for better CouchDB performance.

Assaf went on the tell us more of why he chose to use CouchDB and how it has best helped support his project’s needs.

How did you hear about CouchDB, and why did you choose to use it?

I initially encountered CouchDB on Google.

I had inherited a project that was using Apache SOLR as its main database, but back then (April 2016) it had about 100GB of data, so all was well. The only person with write access to the database was me, so all we needed from SOLR is to be very quick while reading, and it was.

But then, I got 1.2TB of zipped, highly nested, schemaless JSONs to index. SOLR has this neat feature: “Schemaless Mode” which basically just creates an index (=schema entry) for each new field it discovers.

I had to use this mode because all fields with a value of sha1 string had to be fast to query, and the field names were randomly generated (weird, I know).

Because the field names were random, SOLR would create new schema entries all the time, which led it to be extremely slow and unstable.

SOLR would also flatten the input JSONs (e.g. {"a":{"b":1}} => {"a.b":1}) which was very annoying for us. After a couple of weeks and not a lot of GBs indexed, we experienced a big power outage. SOLR took 5 days to recover from this incident (checksum on init? data recovery?), so our systems wasn’t operational for that time span. This was UNACCEPTABLE!

I started googling for a schemaless DB that could support deeply nested JSONs. I ruled out MongoDB because of bad past experience, very slow queries on a 10GB collection with indexes. I also ruled out Elasticsearch because of Lucene. I figured Lucene’s many files and file edits is what caused the long recovery time after the power outage.

I specifically googled “schemaless db” and “mongodb vs”, it was here that I came across CouchDB.

I started reading the documentation and it got me hooked on the “just relax”, “there is no turn off switch, just kill the process” and the ability to build indexes programmatically, so I could recurse into the objects and emit values that match the sha1 regex.

What would you say is the top benefit of using CouchDB?

Durability. Since the SOLR saga, we’ve experienced a few more power outages, hard disk failures and filesystem corruptions (at least 2 of each; yeah, our infrastructure can be better).

Amongst all the panic and horror, I was smiling.

After power outages CouchDB has a zero recovery time. If a hard disk had died or the filesystem got corrupted, CouchDB would just reacquire the lost data by synchronizing from a replica or replicating a backup.

What tools are you using in addition for your infrastructure? Have you discovered anything that pairs well with CouchDB?

  • couchimport
  • jq.
  • curl.

What are your future plans with your project? Any cool plans or developments you want to promote?

Yes, I have found a neat trick to import an archive full of JSON files.

I also plan to add a section about client http keep-alives, to my document detailing my results for seeking better CouchDB performance on Linux systems. I’ve found out that using HTTP keep-alives to access CouchDB can drastically improve CouchDB’s performance, as it doesn’t need to build and destroy TCP connections between interactions with clients. For example, while using Node.js’ request or request-promise package we’d turn on "require ('http').globalAgent.keepAlive = true" and pass "forever: true" with each request.

 

Use cases are a great avenue for sharing useful technical information, let us know how you use CouchDB! Additionally, if there’s something you’d like to see covered on the CouchDB blog, we would love to accommodate. Email us!

For more about CouchDB visit couchdb.org or follow us on Twitter at @couchdb