Omnigia

July 25, 2007

R6RS ratification guesstimate results

Filed under: scheme — Dan Muresan @ 11:42 am

I have gone through the R6RS electorate and tried to guess each elector’s vote based on their statement of interest. It’s been a fun exercise. Most of the statements give clear hints (such as sentences like “R6RS will” + positive action, or Felix’s all-caps screaming battle cry). Other statements offer partial clues (such as insisting on minimalism, or deploring the lack of a standard library). Yet others are completely opaque (some of them reading like CVs packed with the author’s numerous accomplishments), in which case I ignored them.

OK, the results: R6RS will probably pass with a 70% approval rate.

I’d be interested in seeing any other predictions, especially if anybody has closely followed the r6rs-discuss list and has some time to burn. The official results are still more than a month off.

As most of you know, I strongly oppose R6RS. I will limit myself to asking those electors who complained about Scheme’s lack of structs and hash tables to go read the SRFI list and pay attention to SRFI-9 and SRFI-69. If your Scheme does not support one of the widespread SRFIs, you should ask your friendly implementor to reconsider. Also, the fact that R6RS brings some good things does not mean we need to compromise on the bad things; it’s always possible to restart the standardization effort and pick the gems out of the brown stuff.

June 6, 2007

BurryFS: a file system written in Scheme

Filed under: scheme, linux — Dan Muresan @ 10:27 pm

I’ve released BurryFS, a file system based on Fuse and implemented in Chicken Scheme. BurryFS interacts with Fuse (the userspace filesystem API — merged into the Linux kernel since 2.6.14) to organize Digg content as a file system. Since the Fuse API relies on callbacks to deliver file system requests, and Scheme functions cannot serve as C callbacks, I have written a simple inversion-of-control layer that serializes Fuse requests over an internal socket and waits for replies from Scheme. At the other end, Scheme sits in an event loop, unpacking requests, reading information via the Digg API and sending replies. Since Chicken implements cooperative (lightweight) threads, complete with TCP support, BurryFS performance should be high even with multiple parallel requests.

For more information (and downloads), see the BurryFS homepage.

May 20, 2007

SWIG, Chicken and TinyCLOS

Filed under: scheme, oop, ffi — Dan Muresan @ 7:33 pm

Note: this is a fairly technical post; if you have no interest in FFI’s, you may still find the @ TinyCLOS macro useful.

When dealing with large C libraries, SWIG (the wrapper generator) can be a mixed blessing. On the one hand, it’s a pleasure to work with wrapped C libraries from a dynamic language; on the other hand, generating the right wrappers can require significant time and effort, often with nothing to show for the plumbing work until the interface is complete.

In my case, the accessors and modifiers for C structures have been the most painful, initially at least. The library was full of complex, nested records of the following sort:

struct msg {
  int op;
  struct {
    char *name;
    union {
      int start;
      char *dst;
    } args;
  } req;
}

SWIG treats struct msg and its innards as separate objects; in Chicken, if you want to get to msg.args.start, you have to type a monstrosity like (msg-req-args-start-get (msg-req-args-get (msg-req-get msg))) (with bonus points for longer identifiers or deeper structures, of course).

The verbosity grows quadratically, and after a short while I started investigating the TinyCLOS mapping option. When invoked with the -proxy option, SWIG generates wrapper classes for C structures. This is enormously helpful: the previous incantation becomes (slot-ref (slot-ref (slot-ref msg 'req) 'args) 'start), which in real cases is a lot shorter due to, um, linear verbosity. To modify fields, you use slot-set!.

This was still too much typing, so I introduced the @ macro with which you can simply write (@ msg req args start), or (@ msg req name = “flush”):

 (define-syntax @
  (syntax-rules (=)
    ((_ o) o)
    ((_ o slot = v) (slot-set! o 'slot v))
    ((_ o slot . slots) (@ (slot-ref o 'slot) . slots))))

Finally, relief. In retrospect, I find it hard to believe that nobody solved this problem before; maybe there’s some “standard” macro for this purpose, but I haven’t found it.

This isn’t the end of the saga, though. As soon as I moved back from experimenting to the actual library, I was hailed by a salvo of errors indicating that SWIG/TinyCLOS has probably never been used for large applications. Specifically, SWIG translates a composite structure name such as my_class into either <my-class> or <my_class> depending on the context. Presumably, SWIG/TinyCLOS was only tested for the traditional OOP toy examples (Shape, Pos etc.)

Fortunately this is easily fixed with perl -ne 'if (/<.*>/) { s/_/-/g; print } else { print }'. Older versions of SWIG also add an unnecessary (and harmful) (declare (uses tinyclos)) to the Scheme wrappers, but this is also easily excised.

The great news is that after all these machinations, as well as others not described here (involving callbacks and typemaps), SWIG/TinyCLOS seems to work without a hitch. I have had no problems using a large C library from a long-running Chicken program — writing the code was a lot of fun (compared to the SWIG saga), and, more importantly, there where no crashes. Has anybody else played with SWIG / Chicken / TinyCLOS?

May 11, 2007

Why the C preprocessor is a good thing

Filed under: c, macros — Dan Muresan @ 8:04 am

Yesterday, Christian Kienle argued that the C preprocessor is a bad thing. When a language lacks closures and garbage collection and forces static typing without type inference on its users, you would think that a moderately powerful feature like preprocessor macros would get some respect, at least in these times of programming-language renaissance when there are so many good alternatives.

First of all, I believe that Paul Graham’s advice holds true in any language: macros should only be used when nothing else will do. But when that happens, avoiding macros leads to contorted or verbose solutions.

Let’s look at Christian’s arguments:

Debugging preprocessor macros is hard

It’s true that most debuggers can’t map compiled code back to the original macros. However, most debugging is (or should be) done outside debuggers, and debugging would be hard without the preprocessor:

  • the preprocessor provides the __FILE__ and __LINE__ macros. Yes, they could be predefined identifiers, just like C99’s __func__, but that’s actually a less flexible solution: since C concatenates adjacent string literals, you can write "error in " __FILE__, but you can’t do that with __func__
  • assert can only be written as a macro, since it needs to stringify the condition being tested
  • Without macros, you’d be forced to invoke logging primitives like in Java:
    if(logger.isDebugEnabled() {
      logger.debug(expensive_function ());
    }
  • using macros and RAII in C++, you can write a tracing system

Preprocessor macros are not type-safe

True, and it’s the closest thing that C/C++ have to type inference. Christian doesn’t actually show how this supposed type-unsafety can bite you, but instead points to the next reason and suggests that you use templates in C++ (or NSNumbers in Objective C). I don’t know about Objective C, but

template <class T1, class T2> bool min (T1 x, T2 y)
{ return x < y ? x : y; }

looks pretty verbose to me.

Preprocessor macros often lead to side effects

What this means is that macro arguments can appear multiple times in the macro-expansion:

#define MAX( a, b ) ((a) < (b) ? (a) : (b))

If one of the arguments is an expression with side effects (such as x++ or a function call that modifies some state), then we have a bug. This is true, but

  • programming with side effects is not a good practice. Even if you don’t have the luxury to program in a functional language, you should still strive to minimize reliance on side-effects
  • macros are usually given capitalized names, like MAX, just so they scream at you when you are about to type MAX (x++, f (y))
  • if one of the arguments is a function f(), but f has no side effects, the compiler may be able to optimize away redundant multiple invocations
  • you get what you pay for — this is not Lisp, after all.

Of the three arguments against macros, only the last one is actually a serious objection; and just because the C preprocessor is too weak doesn’t mean you shouldn’t use it when necessary.

Finally, for fun, I’d like to point you to some macro magic:

If you have other cool examples, feel free to add them in the comments.

April 30, 2007

Scheme object systems: POS

Filed under: scheme, oop — Dan Muresan @ 9:35 pm

I’m no OOP fan (much less a fan of single-dispatch OOP), but sometimes I miss the implicit lexical scope that single-dispatch provides for methods. Take something as simple as

class Rect {
  int top, left, bottom, right;
  int area () const {
    return (top - bottom) * (right - left);
  }

Most Scheme object systems (see for example the Chicken OOP section) turn the area() method body into something tedious along the lines of


(* (- (slot-ref self 'top) (slot-ref self 'bottom))
   (- (slot-ref self 'right) (slot-ref self 'left)))

or, with objects implemented as closures,

(* (- (self 'top) (self 'bottom) ...) ...)

Until recently, I thought that short of codewalker-based macros, nothing could restore the terseness of single-dispatch methods.

Well, I’ve discovered Blake McBride’s POS (portable object system). With POS, you can write

(define-class Rect (ivars top left bottom right)
  (imeths
    (set-top! (self val) (set! top val)) ...
    (get-area (self) (* (- top bottom) (- right left)))))

POS is a set of pure R5RS macros, and correctly interacts with other syntax-rules macros (e.g. macros can appear within method bodies). The trick is not only to represent the objects as closures, but to also expand method bodies inside the closure:

;; not the actual POS expansion -- just an illustration
(define (make-rect)
  (let ((top #f) (left #f) (bottom #f) (right #f))
    (define self
      (lambda (meth-name . args)
        (case meth-name
          ((set-top!)
           (apply (lambda (self val) (set! top val))
                  (cons self args))) ...
          ((get-area)
           (apply (lambda (self)
                    (* (- top bottom) (- right left))
                  (cons self args)))))))

    self))

This way, methods can access instance variables as simple literals. Each object is a dispatch function that closes over those variables.

POS is very useful, and I plan to add default getters and setters, as well as a way to convert between the closure representation and a-lists. This should help with persistence, among other things.

POS has a couple of extra features (inheritance, access to the parent object, class methods) but really is a light-weight system. The major downside is that methods (and instance variables) can no longer be added dynamically, since it’s impossible to inject code (or data) into a closure.

Update: see the comments for yet another way of simulating an implicit “this” argument.

April 15, 2007

Fixing the Courier and Exim SSL certificates

Filed under: imap, linux — Dan Muresan @ 12:07 am

Most hosting accounts come with cPanel, and by implication Exim and Courier under the hood. Some people access their mail using the cPanel webmail interface (usually via https://example.com:2096), but if you need to send more than the occasional e-mail, you probably want to set up Outlook or Thunderbird to connect to the IMAP server.

Sometimes, the hosting company won’t have a canonical host name and matching SSL certificate for your domain, which will lead to endless security warnings in Thunderbird. If you’ve got shared hosting, there’s not much you can do (short of opening a support ticker and hoping for the best), but if you are a VPS customer, here’s how to fix your problem: first, edit /usr/lib/courier-imap/etc/imapd.cnf (in particular, set the correct hostname in the CN=… line). Then, run Courier’s mkimapdcert. This will generate the file /usr/lib/courier-imap/share/imapd.pem, which combines a key and certificate and is used by the Courier IMAP server. Next, copy and paste the RSA private key (including the delimiter lines) from the PEM file to /etc/exim.key, and similarly the certificate (the second section in the PEM file) to /etc/exim.crt.

When you start Thunderbird, it will complain that it can’t verify the certificate (to avoid this you’d have to pay a Certificate Authority like Verisign or Thawte, but we’re not doing that today). Choose to accept the certificate permanently. Voilà, no more warnings.

March 16, 2007

Ubuntu: make the world a better place by holding users hostages?

Filed under: linux, vmware — Dan Muresan @ 1:31 pm

Note: to the many people who just want to fix their problems and don’t care about politics — scroll to the end of this post.

As I was having trouble getting the VMWare MUI to work on Ubuntu, I came upon a bugzilla thread that solved my original problems, but made me very concerned about the Ubuntu developer team. The discussion highlights serious problems with their mentality, priorities, and attitude.

The controversy centers around the default Bourne shell, /bin/sh, which executes scripts in Linux (expert readers may skip this paragraph and the next two). For as long as anyone remembers, Linux distros have provided GNU Bash, an “embrace and extend” version of the original sh (the behaviour of sh is actually standardised in POSIX 1003.2). So /bin/sh was a symlink to /bin/bash — yet bash has extensions that would not work in a standards-compliant sh.

Now, scripts get to choose which shell they run under: the first line in any shell script must read something like #!/path/to/shell. But authors want their scripts to run on as many systems as possible, and the only cross-UNIX shell is /bin/sh — if you required /bin/bash, your script might not run on Solaris.

The problem is that many scripts only see actual usage on Linux, and since there has never really been a “bare” sh around, many scripts inadvertently rely on bash-only features. Everything worked though, and no one complained — until last year, that is.

In June 2006, Ubuntu registered a “feature specification” to use dash rather than bash as /bin/sh. Apparently, dash is faster and needs less memory, so mostly for these reasons the change was approved for Edgy. But dash also struggles to be “more catholic” than bash (though it has its sins too), so not every bash script runs on dash. Since Debian had previously conducted a shell script audit to rid packages of bash-isms, this wasn’t immediately noticed. However, outside packages were never reviewed, and complaints started piling up as new users (and upgraders) flocked to Ubuntu Edgy after its final release.

At that point, a previously obscure bug started gaining entries and visibility. The developers’ response was not what you’d expect for a distro backed by Canonical and self-styled “Linux for human beings”:

there are no plans to change the default configuration back to bash […] If vendors are distributing software that expects /bin/sh to be bash, then that software is broken. Please take it up with them.

So the users are supposed to notice the breakage, carefully debug the scripts to learn that the bug is due to bash-isms, complain to the authors and wait for the fix to arrive. If the users are not programmers, they’re out of luck. All this for software that ran just fine previously, mind you.

Of course, Ubuntu could easily fix this bug, retaining the speed improvement without inconveniencing users: revert sh to bash, and change Ubuntu packages to use dash. But I suppose that would mean conceding users were right from the start (and thus losing face).

Is this going to be a Jeff Johnson moment? What really scared me were comments by someone who claims to be a non-developer (strangely enough, the only non-developer to support the official policy):

Bashisms are bad. They need to be fixed […] Sometimes you have to do things the hard way to make the world a better place. I think we have begun down a slippery slope towards eradication of bashisms. They never would have gone away if it was just ‘the right thing to do’, but now if you write broken scripts you give up support for a major distro.

So, making the world a better place involves taking the userbase hostage, wasting thousands of people anywhere from 30 minutes to a couple of hours, and expecting them to do your bidding (i.e. persuade third parties to conform to some lousy standard that sported incompatible changes several times in a decade)? I really hope this is not what the developer team is secretly thinking, but the fact that there are exactly two replies from a single developer, in spite of the mounting frustration expressed in tens of comments, doesn’t look good. In any case, causing lost productivity that ranges somewhere into the hundreds of thousands of dollars is a remarkable accomplishment, only not one to be proud of.

Update: to those who just want to fix this problem without downgrading Ubuntu: either run dpkg-reconfigure dash or, more brutally,

ln -sf /bin/bash /bin/sh

February 21, 2007

Scsh-regexp and SISCweb URLs

Filed under: scheme, scsh-regexp, siscweb — Dan Muresan @ 7:52 pm

A new version of scsh-regexp, the SCSH regular expression API port to Chicken and SISC, is available. All API functions (except regexp manipulation such as uncase) are implemented, and there are a few bugfixes (including one that prevented compilation under SISC); there is still no SRE support.

For the moment, you will have to download this release from Google Code; the Chicken egg should be updated to release 0.2 in a day or two.

If you’d like to see a port of scsh-regexp to your favorite Scheme, feel free to let me know (or contribute some code), as I am unlikely to make any further changes otherwise (save fixing potential bugs, should any surface). A Guile port would be the easiest.

And now, a fun exercise: scsh-regexp can handle pretty URLs in SISCweb applications. More specifically, we are going to sidestep the publish method by passing all requests to a single handler, which will discriminate between various URLs using the match-cond macro; here’s what a CMS application’s entry point might look like:

(publish "/*" url-dispatcher)
(define (url-dispatcher request)
  (define path (request/get-path-info))
  (define (m url)
    (regexp-search (posix-string->regexp (string-append "^" url "$") path)))
  (match-cond
   ((m "/archives/([0-9]+)/([0-9]+)") (_ year month)
    (list-stories year month))
   ((m "/story/([^/]*)") (_ story-slug)
    (send-story story-slug))
   ...
   (else (send-404))))

match-cond is described in the SCSH manual, but briefly it goes through a list of clauses (just like cond), and when a match is found, it binds variables to the specified sub-matches (which correspond to paranthesized chunks in the regexp). Note that the URLs now contain regexp-style wildcards, instead of SISCweb’s usual servlet-style wildcards.

This approach can be made cleaner with macros, or, as I will describe in a future post, by using an upcoming feature of SISCweb 0.5 — (publish "/some/url/.*" handler 'regexp)

February 8, 2007

VMWare: when two OSs access the same partition

Filed under: linux, vmware — Dan Muresan @ 7:42 am

Probably the most convenient way to run Windows under Linux is to start with a dual-boot setup, then create (in Linux) a VMWare Server virtual machine based on the physical Windows partition. This ensures that you don’t have re-install Windows and your favorite applications.

But with great convenience comes great danger. When you power on the virtual machine, it will boot into GRUB (or LILO) which will ask which OS you want to run. No problem you’ll say, select Windows, it’s just a small inconvenience. Until the day your fingers err. Or, if GRUB has a timeout, the day you run to get a cup of water and come back to witness Linux booting. That means that the virtual machine and the host OS are now accessing the same partitions simultaneously.

The various VMWare tutorials strongly caution you to avoid this situations, which will likely result in data loss. But maybe you are wondering just how bad things can go (at least I always have). Well, about a month ago, facing a complete Linux re-install, I found the perfect opportunity to experiment. I had two Linux partitions (a JFS root and an EXT3 volume). So I powered up the virtual machine into Linux, and let it run its course, after which I rebooted.

The results? Surprisingly, the root JFS partition came out from fsck unscratched. That’s right, there were no errors, and nothing in /lost+found. The EXT3 partition, by contrast, was destroyed beyond repair (it started with a bad superblock, and went downhill from there as I tried to recover). Unphased, I decided to try again (after reformatting my EXT3 partition). The same thing happened. I have no ideea why, and I wouldn’t necessarily conclude that JFS is safer, but if you ever have the chance (or misfortune) to experiment, let me know how it goes…

And now, on to something more useful: how do you prevent such disasters? The answer is to force the VMWare partition to boot from a virtual floppy disk that makes the correct OS choice automatically (it could be GRUB with a single-item boot menu, or an NTLDR-based solution). Scott Bronson’s VMWare tutorial shows how to do this. Unfortunately, his method is rather inconvenient, requiring several reboots. So what follows is a simpler solution that replaces steps 3-10 from his Set up the Boot Disk section:

dd if=/dev/zero of=bootdisk.img bs=1k count=512
mke2fs -F bootdisk.img
mount -oloop bootdisk.img /mnt
mkdir -p /mnt/boot/grub
cp /boot/grub/stage[12] /mnt/boot/grub/

cat >/mnt/boot/grub/grub.conf <<EOF
timeout=3
title=Windows
root            (hd0,0)
chainloader     +1
makeactive
EOF

umount /mnt

grub --device-map=/dev/null <<EOF
device (fd0) bootdisk.img
root (fd0)
setup (fd0)
quit
EOF

The rest of Scott’s tutorial still applies — in particular, setting up different hardware profiles is important. How important? I’ll let you know next time I’m stuck with a complete Windows reinstall…

January 30, 2007

Compressed filesystem using SquashFS and AutoFS

Filed under: linux — Dan Muresan @ 9:42 am

When installing a modern Linux distribution on older computers, one problem you may face is the lack of disk space. I ran into this last week, while helping a friend install Ubuntu on an antique laptop with a 2G hard drive. The obvious starting point is to begin with a minimalist installation — Ubuntu Alternate CD (my choice), Arch Linux, or a few others. The good news is that your system doesn’t have to stay minimalistic if you know how to tailor the distribution.

One way to save space is to use data compression. It’s possible to keep parts of the filesystem compressed on disk and have Linux decompress them on the fly when they’re needed. This ideea is as old as Stacker / DoubleSpace, but for Linux we need to do more work, as there’s no stable read-write compressed filesystem as of this writing (though you may want to watch Johan Parent’s compFUSEd as it matures).

First, install the tools: squashfs, a compressed file system that yields better performance than the traditional cramfs, and autofs, to mount and unmount compressed directories automatically. Next, if you’ve never used a compressed filesystem, it helps to play with squashfs a bit:

# log in as root or type "sudo bash"
mksquashfs /tmp dummy.squashfs
mount -o loop dummy.squashfs /mnt
ls /mnt           # should be identical to /tmp
touch /mnt/x  # won't work, squashfs is read-only

This example creates a squash file system (in the file dummy.squashfs) that mirrors the contents of /tmp and mounts it (using loop, since it’s an ordinary file and not a block device) on /mnt. As the last command demonstrates, you can’t write in a squashfs, so you’ll want to compress directories that are normally not modified (so /tmp would actually be a bad choice, and so would be any user home directory, /var etc.)

Now, to work — let’s set autofs up (this only needs to be done once:)

cd /etc
echo '/var/autofs/squash /etc/auto.z --timeout=300' >>auto.master
echo '* -fstype=squashfs,loop :/opt/squashfs/&.squashfs' >>auto.z
/etc/init.d/autofs restart

The first line tells autofs to read /etc/auto.z (and to unmount auto-mounted directories 300 seconds after they are unused for 300 seconds); the second one says that whenever someone accesses /var/autofs/squash/DIR (where DIR is an arbitrary name), autofs should try to mount /opt/squashfs/DIR.squashfs automatically.

Next, set your sights on a large, read-only directory — say /usr/lib/mozilla-thunderbird. Here’s the plan:

  1. convert relative symlinks: for f in `find /usr/lib/mozilla-thunderbird -type l`; do t=`readlink -f $f`; rm $f; ln -s $t $f; done
  2. create a compressed filesystem: mksquashfs /usr/lib/mozilla-thunderbird /opt/squashfs/mozilla-thunderbird-lib.squash
  3. remove the original directory: rm -rf /usr/lib/mozilla-thunderbird
  4. replace the directory with a symbolic link: ln -s /var/autofs/squash/mozilla-thunderbird-lib /usr/lib/mozilla-thunderbird

You may wonder why the first step is necessary. The answer is that /usr/lib/mozilla-thunderbird contains some relative links (things like ../share/icons) that would break when the directory is relocated to /var/autofs/squash. So we use find to locate symlinks, readlink to read their target, and then rewrite these links.

That’s it. Whenever you access the compressed directory, it will be automounted:

ls /usr/lib/mozilla-thunderbird
mount

This method does have one disadvantage: if you ever upgrade thunderbird, dpkg will follow the compressed directory symlink and try to write inside it (which will fail). You should remove the /usr/lib/mozilla-thunderbird symlink prior to an upgrade (and, presumably, re-compress once the upgrade completes)

« Previous PageNext Page »

[ Powered by WordPress ]