Is it possible to measure disk space of documents in a view? (see thread)
Question: is there a way to measure the amount of disk space used by the documents returned by a particular view? As an example, let’s say all documents in a user’s database are tagged with a userId and the user has a view that returns all documents by userId. Now the user would want to measure the amount of space used by userId=100. Is this possible?
- A possible approach would be by finding some reliable / good-enough way to calculate an arbitrary document’s disk size and emitting such value in the map function (among whatever else is already being emitted), and then calculating that specific sum on reduce. It’s hard to tell about attachments though.
- In any case, users would also have to account for the append-only way of
things, which will incur a overhead of up to whatever the compaction
threshold is (the respondent’s databases and views usually grow to around twice their real size before they’re compacted).
- It’s not hard to walk a JSON object recursively and compute how many bytes it would probably occupy as JSON. Then users would need to add in the encoded_length of each attachment.
- But after this, users run into implementation details like: are document bodies stored using some kind of compression (like Snappy)? Are they even stored as JSON at all, vs. serialized Erlang terms? And what about conflicts — if a doc is in conflict, users really need to add up the size of each conflicting revision, but can’t access the other revisions from a map function.
Manually deleting a _design directory (see thread)
Question: A user has a CouchDB instance which they are no longer using for data processing. That is, they need the data to be there, but the views are no longer needed, since they have moved the data processing to another server. Now the user would like to free the space used by the views (currently nearly 5 GB) and want to know:
- Can they simply delete the design directories? (rm -rf .*_design)
- Will this affect the documents themselves?
- Is it possible to do this in a running CouchDB instance?
- Will this really free-up disk space, or does CouchDB keep view file
handles open, so that a restart is needed?
The user is aware that doing so will still leave the _design documents in the databases, and triggering those documents will recreate the views, but this is no problem at the moment, the user just wants to free-up some disk space quickly.
- It is possible to delete these files, but it’s not recommended since CouchDB process may still hold file descriptors opened on these files. It’s better to delete design document in CouchDB and cleanup outdated views.
- With deletion .*_design, users remove view indexes. If a design document
still exists in the database and CouchDB doesn’t keep fd on any of these
files, it will rebuild the index from scratch on the next request to a view.
- If this is being done while having CouchDB in run, users need to restart it to let it release file descriptors. Or they can find fd in /proc/<couchdbpid>/fd and close them via gdb.
Releases in the CouchDB Universe
- pouchdb-replication-stream – a streaming replication protocol for CouchDB & PouchDB
- boxspring 0.0.2 – a collection of Backbone Model classes for interacting with CouchDB compatible with Browser and Server-side execution
- poms 0.0.3 – interface to POMS CouchDB API
- Ethermap – a realtime collaborative, version controlled map editor, and now Open Source
- rlx 0.1.255 – command line interface for CouchDB
- couch-joiner 0.1.2 – command line utility for manipulating CouchDB documents
- couch-web 1.0.5 – a boilerplate for CouchDB webapps
- clojure-clutch 0.4.0 – CouchDB client for Clojure
- PouchDB 3.0.5
Opinions and other News in the CouchDB Universe
Use Cases, Questions and Answers
Q&A: How does CouchDB store my items?
Question: How does CouchDB store all my items?
Answer: CouchDB’s storage engine is based on the venerable B-tree, now used in and the on-disk format is appended to on each update. There are some excellent posts out there with a lot more detail:
- CouchDB Docs: The Key to your Data
- Damien Katz: CouchDB Technical Overview
- Ricky Ho: CouchDB Implementation
- More CouchDB reading: btree:lookup
- KodeKabuki: CouchDB naked
For B-trees in general and some interesting variants, read up on:
- Wikipedia: B-trees
- Cornell University, Department of Computer Science: B-Trees
- Jonathan J Hunt: B-trees are the new black
- Research: a practical distributed B-tree
- Minuet: a scalable distributed Multiversion B-Tree
More Q&A from this week
- Stack Overflow: Confused with CouchDB and Couchbase
- Stack Overflow: PouchDB exclude design documents when using autogenerated uuid
- Stack Overflow: How to handle image uploading in CouchDB?
- Stack Overflow: Triple join in CouchDB?
- Stack Overflow: Using couchdb for caching and scalability during high traffic periods
- Stack Overflow: Sorting values in CouchDB
- Stack Overflow: how to enable google chrome browser to save passwords for couchdb futon
No public answer yet:
- Stack Overflow: Is there spring-data for CouchDB?
- Stack Overflow: Significance of Consistent http document store of CouchDb
- Stack Overflow: How to retrieve all documents in couchdb database without causing out of memory
- Stack Overflow: How do I only allow access to one document for each client using CouchDB (Cloudant)?
- Stack Overflow: Find One Query not running in CouchDB
For more new questions and answers about CouchDB, see these search results.
If you want to get into working on CouchDB:
- We have an infinite number of open contributor positions on CouchDB. Submit a pull request and join the project!
- Do you want to help moving the CouchDB docs translation forward? We’d love to have you in our L10n team! See our current status and languages we’d like to provide CouchDB docs in on this page. If you’d like to help, don’t hesitate to contact the L10n mailing list on firstname.lastname@example.org or ping Andy Wenk (awenkhh on IRC).
We’d be happy to have you!
- Jenn Schiffer (IRC nic: jennmoneydollars; Twitter; Apache ID: jenn) has been elected as a CouchDB committer (see thread). Welcome to CouchDB, Jenn!
- September 11, Berlin, Germany: Time to relax! CouchDB Hack Night
- September 11, Zurich, Switzerland: NoSQL & DB Management in the Cloud w/ Livestream of Keynote @Cassandra Summit SF
- September 15, Calgary, Canada: Introduction to Cloudant, a fully-managed NoSQL database-as-a-service (DBaaS)
- September 17, London, United Kingdom: September meetup – Cloudant and developing on IBM Bluemix
Job opportunities for people with CouchDB skills
- Manager, Software Development – Integration Services, Atlanta, GA, USA
- Full Stack Node developer, New York, NY, USA
- DevOps Cloud Architect, Los Angeles, CA, USA
- Agile Developer Java, Paris, France
- Software Engineer C#/.Net, Begbroke, UK
Time to relax!
- “There’s no stress that can’t be eased by the cuddles of an office dog.” – 5 reasons you should get an office dog
- “We can take some credit for why the event was so successful, but I also believe that a lot of great things happened by chance and in my experience it’s the things that you didn’t plan for that teaches you the most. So now I’ll tell you what we did and what we learned.” – How to create a tech event where everyone feels welcome
… and also in the news
- All you need is L*** (talk slides with notes)
- Junior Designers vs. Senior Designers: The Year of the Looking Glass
- “CouchDB is love. CouchDB is life. My last few weeks playing with couch and pouchDB have been super fun.
#javcascript #chromeos” (@therocco, on Twitter)