The Little Things(1): Do Not Delete

CouchDB takes data storage extremely seriously. This usually means we work hard to make sure that the CouchDB storage modules are as robust as we can make them. Sometimes though, we go all the way to the HTTP API to secure against accidental data loss, saving users from their mistakes, rather than dealing with hard drives and kernel caches that usually stand in the way of safe data storage.

The scenario:

To delete a document in CouchDB, you issue the following HTTP request:

DELETE /database/docid?rev=12345 HTTP/1.1

A common way to program this looks like this:

http.request('DELETE', db + '/' + docId + '?rev=' + docRev);

So far so innocent. Sometimes though, users came to us and complained that their whole database was deleted by that code.

Turns out the above code creates a request that deletes the whole database, if the docId variable isn’t set correctly. The request then looks like:

DELETE /database/?rev=12345 HTTP/1.1

It looks like an honest mistake, once you check the CouchDB log file, but good old CouchDB would just go ahead and delete the database, ignoring the ?rev= value.

We thought this is a good opportunity to help users not accidentally losing their data. So since late 2009 (yes, this is an oldie, but it came up in a recent discussion and we thought it is worth writing about :), CouchDB will not delete a database, if it sees that a ?rev= parameter is present and it looks like that this is just a malformed request, as database deletions have no business requiring a ?rev=.

One can make an easy argument that the code sample is fairly shoddy and we’d agree. But we are not here to argue how our users use our database beyond complying with the API and recommended use-cases. And if we can help them keep their data, that’s a win in our book

Continuing down this thought, we thought we could do one better. You know that to delete a document, you must pass the current rev value, like you see above. This is to ensure that we don’t delete the document accidentally without knowing that someone else may have added an update to it that we don’t actually want to delete. It’s CouchDB’s standard multi version currency control (MVCC) mechanism at work.

Databases don’t have revisions like documents, and deleting a database is a simple HTTP DELETE /database away. Databases, however, do have a sequence id, it’s the ID you get from the changes feed, it’s an number that starts at 0 when the database is created and increments by 1 each time a document is added, updated or deleted. Each state of the database has a single sequence ID associated with it.

Similar to a rev, we could require the latest sequence ID to delete a database, as in:

DELETE /database?seq_id=6789

And deny database deletes that don’t carry the latest seq_id. We think this is a decent idea, but unfortunately, this would break backwards compatibility with older versions of CouchDB and it would break a good amount of code in the field, so we are hesitant to add this feature. In addition, sequence IDs change a little when BigCouch finally gets merged, so we’d have to look at this again then.

In the meantime, we have the protection against simple coding errors and we are happy that our users keep their hard earned data more often now.

CouchDB Weekly News, April 3

Major Discussions

Vote on release of Apache CouchDB 1.5.1-rc.1 (will be released as Apache CouchDB 1.5.1 — see thread)

The vote passed.

Importing CSV data into a CouchDB document using Python-Cloudant module (discussion still open — see thread)

Approaches brought up: (1) Transforming a CSV file into a JSON file with Python; (2) rc_scv, a direct rcouch extension; (3) using Google Refine and Max Ogden’s refine uploader (see visual example here); (4) CSV2Couch. Further information can also be found in this blog post on “Using Python with Cloudant” and the newer Cloudant-Python interface

CouchDB 1.6.0 proposals (see thread)

Discussion around the open blocker plus how to deal with it and around re-cutting 1.6.x from master. Releasing 1.5.1 will take precedence.

Poll around Erlang whitespace standards (see thread; the poll is still open)

Joan Touzet: “I know many of you are fed up with not being able to auto format in your favourite editor and match the CouchDB Erlang coding standards, or receiving pull requests that are formatted poorly. I’d like to fix that with an appropriate whitespace standard, and supplementary plugins for vi and Emacs that will just Do The Right Thing and let us all stop worrying about whitespace corrections in pull requests.”

There’s currently a poll around this topic which is still open.

Multiple Concurrent Instances of CouchDB on Mac OS X 10.7.5 (see thread)

Approaches: (1) for a powerful enough machine Vagrant could be used to spin up a few CouchDB VMs, e.g. with CouchDB-Vagrant. Other options could be: (2) using Docker, (3) Node-Multicouch (used in Hoodie) or (4) this script to configure isolated instances, it should be possible to point couchdb to the CouchDB commands inside the .app.

CouchDB Universe

Releases in the CouchDB Universe

  • Wilt 3.0.0 – a browser/server based CouchDB API library based on the SAG CouchDB library for PHP
  • contentful-2-couchdb – a proof of concept for easy data export to CouchDB and replication use
  • Availability of MariaDB 10 announced
  • PouchDB released a new website
  • PouchDB 2.1.0 release includes e.g. (all release notes here):
    • Support optional leveldown builds
    • Replication performance improvements
    • Fix for localStorage detection in Chrome Apps
    • Improved error reporting from replicator et al.

Opinions

Use Cases, Questions and Answers

Getting involved into CouchDB

If you want to get into working on CouchDB: here’s a list of beginner tickets you can get started with. These are issues around our currently ongoing Fauxton-implementation. If you have any questions or need help, don’t hesitate to contact us in the couchdb-dev IRC room (#couchdb-dev) – Garren (garren) and Sue (deathbear) are happy to help. We’d appreciate having you!

New Committers and PMC Members

… and also in the news

Posted on behalf of Lena Reinhard.