CouchDB Weekly News, December 04, 2014

Major Discussions

Infinite resets hanging views (see thread)

Question: a user is trying to run a temporary view on a big-ish db (> 1000 docs), and the view seemed to hang forever, and they were not sure if there was something wrong with their view or settings. The view looked like this:

# map function
function(doc) {
emit(doc, doc.tweets);
}

Answers / approaches:

  1. Think of `map` as a way to create secondary indexes.
    1. An index on the whole document doesn’t make much sense.
    2. Moreover, at query time, if you want to get data that are in the same document, need map/reduce isn’t needed at all. Just create a `show` function.
  2. I would go further and say: Never emit the full doc in a map function.
    You can get the full doc by using the include_docs flag when querying the
    view. If you need a key that only refers to a single doc, use the _id of
    the doc.
  3. Don’t write your own reduce functions, use couch’s built in, Erlang functions (See http://wiki.apache.org/couchdb/Built-In_Reduce_Functions). You will save yourself a lot of headaches and you will not match the performance of these functions. In your case, _count is probably what you want, though you probably don’t even need a reduce function with such an index (as Aurelien essentially pointed out). Handle the case where doc.tweets doesn’t exist in your map function:
    emit(doc._id, doc.tweets || []);

Allow user-defined views (see thread)

Question: a user was thinking about a platform based on CouchDB, where each set of users would get their own CouchDB Database, to store and query data. They want to allow them to define their own custom queries to query the data and want to create a form which allows to build a query and translates it to a JS view, and, on top of that, defining custom views directly in JS – basically their custom map/reduce functions. The users would also be able to import different sets of data. Still, they were not sure about possible DoS attacks with endless loops inside the function, or attacks by emiting too much data.

Approaches:

  • os_process_timeout applies to view servers (default is 5 seconds) so you can configure the timeout for view queries, not sure about CPU/HDD Space (although inspecting _stats might shed some light).
  • If viable for the application, it may also be possible to initially have a process where new queries written are vetted by a human before they are run. The advantage of this is two-fold, namely:
    • i) you’ll be able to move on a prove your concept quickly
    • ii) while doing this, you may learn enough (and things may change enough) for you to automate the vetting process.
  • The worst case will be always RCE since you’re going allow everyone
    execute arbitrary code on your server. JavaScript query server is only safe while SpiderMonkey sandbox is. However, if you want to use some custom query servers when things will go bad since nor Python, Erlang, Clojure and other I know servers supports sandboxing which means you can do anything.
  • The idea may work for Mango – Cloudant views query DSL – since it’s very limited by allowed operations, but then you’ll face another problem: disk space will run out very quickly since a single index file may be much more bigger then database itself.
  • For now the most simple and secure way to allow custom users view is to let users replicate your database to their CouchDB instance where they can do anything whatever they need.
  • Another approach: use CouchDB for whatever you need on the server side, and when a user needs to run a custom query, run that query on a client-side copy of the data in PouchDB. This wouldn’t work with very large data sets, but would work on smaller data sets.
  • Alternatively, don’t allow the user to write the view. Build some Ui that helps creating the view function, specify some things that can be run on the data and construct the view with that. This is essentially a query builder for CouchDB.

Releases in the CouchDB Universe

Opinions and other News in the CouchDB Universe

Use Cases, Questions and Answers

No public answer yet:

For more new questions and answers about CouchDB, see these search results.

Get involved!

If you want to get into working on CouchDB:

  • We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
  • Can you help with Web Design, Development or UX for our Admin Console? No Erlang skills required! – Get in touch with us.
  • Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on l10n@couchdb.apache.org or ping Andy Wenk (awenkhh on IRC).

We’d be happy to have you on board!

Events

Time to relax!

… and also in the news

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s