Automated Fishing Reports: The Charter & Haul App with Offline Sync

During the course of 2019, New Zealand rolled out an electronic catch and position reporting requirement for commercial fishing vessels and later amateur charter vessels to report the day’s journey and haul. For the Ministry for Primary Industries it’s an important way to keep up with quotas, but doing it on paper is both time consuming and easy to mix up. By reducing error-prone manual submissions, automation has helped the ministry make better informed decisions around fishing sustainability and monitoring.

Martin Junek developed eCatch in 2018, in anticipation of the rollout. eCatch is now contracted by Fisheries New Zealand to provide tools for both commercial and amateur fishing vessels. 
Read on to learn how using CouchDB’s replication protocol means eCatch automates reporting even when vessels drift out of network coverage.

Which of CouchDB’s distinct features have been essential to you achieving your goal(s)?

How did you hear about CouchDB, and why did you choose to use it?

I had used CouchDB in another project before. We needed a database that is easy to sync with a mobile app, easy to host in Docker and that we have complete control over. Cost was also a major consideration for a starting business.

Did you have a specific problem that CouchDB solved?

I wanted to create an offline-first app that syncs to a CouchDB: collecting data offline and syncing them to a central CouchDB when the device comes online again.

CouchDB’s _changes feed feature plays a big role in eCatch: 

  • We collect the data that is synchronised to CouchDB. Then there’s a service that waits for anything new to appear in the database, and sends it off to the Ministry for Primary Industries via their API. So, we basically listen to the changes feed on the global database and that tells us there’s a new document in the database.
  • We also have a Postgres database, and whatever lands in CouchDB gets sent off to Postgres where it’s easier for us to query. 

For the folks who are unsure of how they could use CouchDB — because there are a lot of databases out there — could you explain the use case?

We needed an app that collects data offline (electronic catch reporting for commercial fishing vessels) and then syncs the data with a central database. 

An important requirement is the ability to separate data for each device/user, so there is no chance of accidental data leak. The data the app collects is considered sensitive. 

What would you say is the top benefit of using CouchDB?

Definitely the sync feature that works out-of-the-box. Easy to host and maintain — it has been very stable the whole time. Also listening for the _changes feed to handle new data, which is then integrated to 3rd party systems.

It’s a common problem with SAAS apps that a bug in the code can expose somebody else’s data. CouchDB solves that just because there’s a different database for every user. So I don’t have to worry about that much, which is a big plus.

What additional tools are you using in your infrastructure? Have you discovered anything that pairs well with CouchDB?

Docker for hosting, as well as PostgreSQL. We feed all data into Postgres for easier querying/data analysis. We also use PouchDB in React Native; this requires a 3rd party adapter, which is a missing link in the CouchDB/PouchDB ecosystem. It’s one of my biggest concerns that the author stops maintaining it. 

What are your future plans with your project? Any cool plans or developments you want to promote?

We’d like to expand the app into other areas of the industry.

If you’re curious, visit eCatch in the Google Play and App store to get an idea of how users interact with it:

Thank you, Martin, for giving us a behind-the-scenes look at eCatch and how it helps New Zealand’s seas. We wish you happy surfing!

Use cases are a great avenue for sharing useful technical information — let us know how you use CouchDB. Additionally, if there’s something you’d like to see covered on the CouchDB blog, we would love to accommodate. Email us!

For more about CouchDB visit couchdb.apache.org or follow us on Mastodon: @couchdb@fosstodon.org  

Building Offline-First Knowledge Management: How CouchDB Powers zettel.io  

Simon Lucy has been helping teams with their own systems, software and architecture for 40 years as a consultant and served as BBC’s Head of Platform until 2013. His work as Lead Architect at the scientific publisher Elsivere deepened his curiosity of the knowledge management rabbit hole and eventually led him to create his own app.

We caught up with Simon to learn about zettel.io, what CouchDB does to make it happen and discover the moments of database resilience he encountered along the way.

Which of CouchDB’s distinct features have been essential to you achieving your goal(s)?

  • Document database using JSON and controllable schemas
  • Painless clustering and replication (this really is painless now)
  • MapReduce
  • Filtering on queries
  • Full-text search

Easy-to-configure replication settings and expanded sharding features in newer versions of CouchDB make it much simpler to achieve your desired clustering and replication behaviour.

How did you hear about CouchDB, and why did you choose to use it?

In late 2008 I began working at the BBC in Future Media on what was the Forge Project, the about-to-go-live development platform, and CouchDB had been chosen as essentially a key-value store. That was early in CouchDB’s development as well, version 0.6 I think. It was generally very successful for the purpose though it wasn’t used for its main features as a document store or a MapReduce database.

Around 2018/19 I began planning zettel.io, a Zettelkasten for managing and manipulating notes and documents from any source. I chose CouchDB as a highly scalable and distributable document store that was both lightweight and searchable.

Having an open schema was also important as there could be structural differences between documents that could be managed without a great deal of pain. The final significant feature was that its API interface was REST which fitted into the overall architecture.

During Simon’s time at the BBC over 300 applications were migrated between 2009-2011 with scalable CouchDB as the key-value store. Larger apps, like iPlayer, received dedicated platforms since then and have continued to develop.

Did you have a specific problem that CouchDB solved?

It’s a fit for the document architecture. A Zettelkasten is generally a box of cards, each card is a note and may be related to other notes in the box by metadata written on the card, or coloured stickers or whatever. There can be multiple boxes.

A Kasten in zettel.io is a CouchDB database; notes are imported into the Kasten and along with them is all the original metadata, including any tags, that were with the original note in whatever system it was created in.

Because a Kasten is a very light kind of database — it’s a file essentially — it’s cheap to copy documents (notes) from one Kasten to another, or move them. Each note has a home Kasten and when copied, a light copy of only the fields necessary for organising is used and the document has the original location (its _id is maintained across all Kasten or databases).

CouchDB’s plastic data architecture, which does away with lots of internal document metadata, makes it not just possible but easy to have many copies of a document, and to remix document relationships — or curate and re-curate in a knowledge management scenario. CouchDB’s append-only nature means you can always find (and never edit) the original doc, which in turn means corruption isn’t feasible, making zettel.io extremely resilient. 

For the folks who are unsure of how they could use CouchDB — because there are a lot of databases out there — could you explain the use case?

It’s entirely a document database, there are no transactional processes; there are states and statuses and there are other document types for things like lists of notes. For instance, all of the notes have either no particular order or they could be in date order or in any order arranged by the user and that order is saved as a name and a list of _ids in that order.

There’s an entirely different CouchDB cluster which is for the web frontend. That has the session of each user, which is just the state of the oAuth session. It stores all the queries that are made; they age eventually but each gets its own URL as each response is just another CouchDB document. All the web frontend has to do is save the response, but the response is also treated as a Kasten with some behind-the-scenes conducting. The results can be copied to another Kasten or a new one.

There’s also a database for storing some system config, largely it’s used as a WebFinger server. I don’t use CouchDB _users other than for my own use, all the users have their own user record in the main data cluster database. The rights management is handled by the application, not CouchDB.

A syntax parser makes SQL search possible and Clouseau enables full-text search of both documents and PDF content. Later, Simon plans to implement the ability to search across databases by searching them each and merging results.

What would you say is the top benefit of using CouchDB?

Specifically, it’s a very close match to the data model and it naturally fits in how it accumulates documents and versions them. Although a note as imported is largely immutable, features can improve the extraction of elements, improving the semantics and it doesn’t break the model.

Generally, it’s a document database that takes care of all of the annoyances.

Recently I had a system run out of disk space and whilst there were failures nothing was catastrophic, largely because the design from top to bottom is of REST, including CouchDB. Once disk space was added everything ran as expected. Resilience, Distribution, Sharding, Eventual Consistency!

Simon designed the document _id with static prefixes corresponding to the instance, making CouchDB a distributed database and enabling offline-first functionality so that the app works equally well as a web service or offline. If you’re curious about your own use case, plugins like nanoid give added flexibility to UUID generation. 

What additional tools are you using in your infrastructure? Have you discovered anything that pairs well with CouchDB?

I still use a Cloudant python library as the layer between my code and CouchDB, it’s frozen and deprecated now so I should plan a way off it. It’s only used for basic full-text search and some other queries are direct. All of my code is Python and the model of treating a JSON structure as a dictionary determines the way everything works.

For help in manipulating collections of files, zips, PDFs, attachments and images etc. I use Apache Tika.

I don’t use CouchDB to store attachments, they go into S3, but anything that’s extractable as text is stored as text with the note.

What are your future plans with your project? Any cool plans or developments you want to promote?

Finish the manual so it’s usable. zettel.io allows the user to manipulate and graph their notes so there’s a desire to create useful knowledge graphs. At the moment I’m in the maintenance hell of moving both Python version and OS distribution and that’s largely because of AWS policies.

A specific and more immediate improvement for CouchDB is to change the full-text search from Clouseau to using Nouveau.

Thank you for sharing your story and architecture insights with the CouchDB community, Simon. Stay up to date with Simon and his projects on Mastodon: @simon_lucy@mastodon.social.

Use cases are a great avenue for sharing useful technical information; let us know how you use CouchDB. Additionally, if there’s something you’d like to see covered on the CouchDB blog, we would love to accommodate. Email us!

For more about CouchDB visit couchdb.apache.org or follow us on Mastodon: @couchdb@fosstodon.org.