scsh-regexp

Scsh-regexp and SISCweb URLs

A new version of scsh-regexp, the SCSH regular expression API port to Chicken and SISC, is available. All API functions (except regexp manipulation such as uncase) are implemented, and there are a few bugfixes (including one that prevented compilation under SISC); there is still no SRE support.

For the moment, you will have to download this release from Google Code; the Chicken egg should be updated to release 0.2 in a day or two.

If you'd like to see a port of scsh-regexp to your favorite Scheme, feel free to let me know (or contribute some code), as I am unlikely to make any further changes otherwise (save fixing potential bugs, should any surface). A Guile port would be the easiest.

And now, a fun exercise: scsh-regexp can handle pretty URLs in SISCweb applications. More specifically, we are going to sidestep the publish method by passing all requests to a single handler, which will discriminate between various URLs using the match-cond macro; here's what a CMS application's entry point might look like:

(publish "/*" url-dispatcher)
(define (url-dispatcher request)
  (define path (request/get-path-info))
  (define (m url)
    (regexp-search (posix-string->regexp (string-append "^" url "$") path)))
  (match-cond 
   ((m "/archives/([0-9]+)/([0-9]+)") (_ year month)
    (list-stories year month))
   ((m "/story/([^/]*)") (_ story-slug)
    (send-story story-slug))
   ...
   (else (send-404))))

match-cond is described in the SCSH manual, but briefly it goes through a list of clauses (just like cond), and when a match is found, it binds variables to the specified sub-matches (which correspond to paranthesized chunks in the regexp). Note that the URLs now contain regexp-style wildcards, instead of SISCweb's usual servlet-style wildcards.

This approach can be made cleaner with macros, or, as I will describe in a future post, by using an upcoming feature of SISCweb 0.5 — (publish "/some/url/.*" handler 'regexp)

Regular expressions in Scheme

In my last post, I mentioned generating the R5RS identifier list by scraping the HTML version of the R5RS standard. I decided to use Scheme for the job, and quickly learned that Chicken and SISC lack adequate regexp support (SISC has no support at all, apart from letting you interface with the underlying JVM). Eventually, I settled upon SCSH, as it has a powerful regexp API, as well as good shell integration.

The resulting SCSH script took forever to run (to be fair, I added code to separate procedure names from macro names, and didn't bother optimizing beyond the naive O(n2) algorithm). I started to miss Chicken's speed. The SCSH regexp API looked reasonably easy to port. I ended up writing both a Chicken and a SISC emulation layer (the latter based on java.util.regex). I am planning to add a pregexp backend as well, which would extend regexp support to any R5RS system.

Have a look at the scsh-regexp project for details, examples and news.