Portable SCSH regexps

Dan Muresan

In The SRE regular-expression notation, Olin Shivers describes a "100% solution" for working with regular expressions in Scheme. Unfortunately, his solution has only been implemented in SCSH until now, while other Schemes had various "80% solutions" (or worse...) By releasing scsh-regexp, I aim to change that.

There are two parts to Olin's solution: the SRE notation and the regexp API. The SRE notation is a Scheme representation of POSIX regular expression strings; for example, (: (| upper ("aeiou") digit) (* digit)) is equivalent to "[A-Zaeiou0-9][0-9]*". See Summary SRE syntax in the SCSH manual. In essence, the SRE notation is syntactic sugar.

The SCSH regexp API provides a standardised way to search strings for regular expressions, manipulate matches, and operate replacements. Here's a (trivial) example of listing dates found in a text document:

regexp "([0-9][0-9]?)/([0-9][0-9]?)/([0-9]+)"))
(define (process-match m)
  (let-match m (whole m d y)
    (for-each display (list whole " = month:" m " day:" d " year:" y "\n"))))
(define text "Nothing happened on 04/01/2002 or 04/01/2003")
(regexp-for-each date-re process-match text)]]>

It turns out that the SCSH regexp API can be implemented in terms of the basic primitives offered by various Schemes. As a start, I have ported the regexp API to Chicken and SISC. SISC was the more interesting case, as it has no regexp support, but offers access to the underlying Java VM.


The package is distributed as an "egg"; install it using chicken-setup scsh-regexp.egg, then load it using (require-extension scsh-regexp).
Download the Chicken egg (which is really just a tar.gz archive) and extract the main directory:
tar xvfz scsh-regexp.egg scsh-regexp
To load it, cd into the extracted directory, start SISC and type
(require-extension (lib scsh-regexp/scsh-regexp))

Should the Chicken repository be unavailable, you can also download scsh-regexp from Google Code.


Please submit bug reports and feature requests using the tracker on the scsh-regexp Google Code project page.


So what's left to do?

I'm unlikely to work on these (or other) enhancements unless I hear from users — so, again, you are encouraged to get in touch.