BurryFS: Digg as a filesystem

Setup instructions

  1. Install the Fuse library and devel packages
  2. Install swig
  3. Download burryfs.tar.gz
  4. Compile it:
    gzip -dc burryfs.tar.gz | tar xvf -
    cd burryfs
    ./setup.sh
    
  5. Mount it and enjoy:
    ./start_burryfs /tmp/burryfs
    ls /tmp/burryfs

What it does

Thanks to Fuse, the Digg API and of course BurryFS, you can now view Digg as a Linux file system:

$ ls /tmp/burryfs
config  stories  users
$ cd /tmp/burryfs/stories/topics/programming/popular
$ ls
A_Computer_Science_Degree_Doesn_t_Hurt_Much
Let_s_Build_a_Grid_Webdesign
LifeHacker_How_to_build_a_Firefox_extension
The_Google_Maps_Street_View_Team_Pic
# ...
$ cat A_Computer_Science_Degree_Doesn_t_Hurt_Much/diggs
1286
$ ls -al Let_s_Build_a_Grid_Webdesign/user/submissions
Facebook_verification_outside_of_USA_Sort_it_out
Google_blacklist_sheds_light_on_phishing_tactics
# ...

How it works

BurryFS is a Chicken Scheme program that interacts with Fuse (the userspace filesystem API — merged into the Linux kernel since 2.6.14) to organize Digg content as a file system. Since the Fuse interface uses callbacks to deliver file system requests, and Scheme functions cannot serve as C callbacks, I have written a simple inversion-of-control layer that serializes Fuse requests over an internal socket and waits for replies from Scheme. At the other end, Scheme sits in an event loop, unpacking requests, reading information via the Digg API and sending replies. This is essentially the same approach taken by fusewrapper, but that package wraps the Fuse low-level (inode-based) API, while I chose to wrap the high-level, path-based API (like fusewrapper, I also used SWIG for the Scheme-C interface).

Chicken implements continuation-based (cooperative) lightweight threads; burryfs spawns such a thread for each Fuse request, thus ensuring fast operation even when multiple requests arrive in parallel.

To ease the load on Digg and enhance performance, burryfs caches Digg content for 300 seconds (this can be altered by writing to config/digg/cache/timeout, but please do not set it to a low value).

When you're done, use fusermount -u /tmp/burryfs to unmount the file system.

Organization

burryfs keeps story data in stories/titles/story_name/. However, since the number of stories is large (and potentially infinite), you can't actually list stories/titles (the directory has execute, but not read permissions). If you know the title of a story, you can go directly to that directory. Alternately, you can look in stories/topics/topic_name/popular (or upcoming). Those folders contain symlinks to story directories in stories/titles.

Similarly, user data is kept in users/, but you can't list that directory. You either have to know a username, or follow a user symlink from within a story directory.

The directory config/ exposes some configuration parameters (most of them read-only).

Fuse security

By default, fuse doesn't let user A view content in a file system mounted by user B, even when A is root. It's an interesting security problem: the fuse daemon (running as user B) would get to see data belonging to A, without A having allowed that.

You can override this using the allow_other Fuse option, but it's only available if you're root or if you enable user_allow_other in /etc/fuse.conf.
Dan Muresan