Omnigia

May 19, 2008

Debian / Ubuntu packaging: Zorba XQuery

Filed under: linux, c, debian, xquery, zorba — Dan Muresan @ 1:57 pm

Today I uploaded Ubuntu source and binary (Gutsy and Hardy) packages for Zorba, the new C++ streaming XQuery processor. The Ubuntu PPA system (Personal Package Archives) is a great service; without it, you’d need to host an APT repository in order to conveniently distribute packages that are not (yet) part of Debian or Ubuntu (especially since a Debian source package is actually three files).

In fact, my source package works in Debian unstable too; as there is no custom Debian Sid APT repository (Ubuntu PPA only serves Ubuntu distros), here’s what you need to do to build and install it:

  • dget the .dsc file (which pulls the original tarball and a .diff.gz as well)
  • run pbuilder zorbaxquery_0.9.1-3.dsc (apt-get install and set up pbuilder if you don’t have it)
  • retrieve the .deb’s from /var/cache/pbuilder/results/

It would be really nice if someone set up a PPA-like service for Debian, at least for repositories of source packages. I realize that setting up a cluster of build boxes is possible only with someone like Canonical behind. But the required storage for source packages could be quite small: if the *.orig.tar.gz “link” would dynamically retrieve an archive hosted elsewhere (a webapp could do this, trading space for bandwidth), such repositories could be quite compact (the .dsc and .diff.gz files are usually tiny). Alternatively, this scheme might work with a modified apt that could recognize HTTP redirects.

April 30, 2008

gdb: examining complex c++ objects

Filed under: linux, c, gdb — Dan Muresan @ 8:25 am

I’ve been doing quite a bit of C++ programming (and, alas, debugging) for a project lately. One endless source of annoyance in C++ (at least in Linux) is the impedance mismatch between the compiler (gcc) and the debugger (gdb). C++ is notoriously hard to compile (and even just parse). gdb does a bit of name-demangling, but quickly finds itself out of its depth for complex C++ features (like heavy template usage). This is, after all, an old problem — even with C programs, debugging macro-ladden code is painful.

But I’m not going to get into the details of that; today I’m going to show you how to deal with a lesser annoyance, namely examining STL objects. For example, if you use the gdb’s standard print (or p) command, strings look like a mess, and long ones are truncated:

#include 
#include 
#include 
using namespace std;

int main () {
  string s = “”;
  for (int i = 0; i < 5; i++)
    s += "This is a very, very long line.\n";
  s = s + s;
  cout << s;
  return 0;
}
~$ gdb testprog
(gdb) b 12
Breakpoint 1 at 0×8048c18: file testprog.cc, line 12.
(gdb) r
Breakpoint 1, main () at x.cc:12
12	  cout << s;
(gdb) p s
$1 = {static npos = 4294967295,
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> =
    {<No data fields>}, <No data fields>},
    _M_p = 0x804b3ac "This is a very, very long line.\nTh"...}}

(ok, I had to truncate the string manually here for readability purposes — but the exact value of the maximum width isn’t the issue here.) A much better way to examine strings is to use gdb’s printf command on the appropriate member of the STL object:

(gdb) printf "%sn" s._M_dataplus

This displays the actual string, without encoding newlines as \n. An even better way is to define a printstr command that you can reuse in future gdb sessions; create or edit the file ~/.gdbinit and add the following snippet:

define printstr
  printf "`%s'n", $arg0._M_dataplus._M_p
end

This will allow you to simply say printstr s whenever you need to examine a string. Of course, this definition relies upon GCC’s internal representation of a std::string, which may change from time to time.

After developing this gdb macro, I discovered Dan Marinescu’s excellent STL Views gdb scripts, which adds support for examining vectors, maps, sets (and, yes, strings). The ideea is the same. If you spend any significant time inside gdb, this is an invaluable tool.

It’s probably a good ideea to take this further and create similar printer functions for all complex (and frequently examined) classes in C++ projects. GDB’s user-defined commands are extensively documented in the manual. You don’t need to put such commands in ~/.gdbinit; you can create a separate script and load it using source scriptname.gdb when needed.

February 19, 2008

Fixing SSH tab completion

Filed under: linux — Dan Muresan @ 10:42 am

While everyone is familiar with bash’s TAB completion for paths and filenames, fewer people know that since bash 2.04, TAB can complete arguments in many more contexts thanks to a feature called programmable bash completion. The default completions handle mount, cvs, ant, ssh and many others; it’s also possible to program your own extensions for whichever command seems to stress your typing.

For me, ssh completion (which completes based on known host names, and consequently learns knew hosts as you ssh into them) has been one of the most useful plug-ins. Unfortunately, it has stopped working a while ago (Ubuntu Breezy to be more precise). I have recently discovered the real cause: as a security feature, ssh hashes host names instead of saving them directly into the known hosts file. This feature is meant to prevent worms from learning host names and spreading to them via ssh — except there aren’t many Linux worms around now or in the foreseeable near future. You can disable hashing by adding

HashKnownHosts no

into /etc/ssh_config, or alternatively, you can list some frequently used hosts into your ~/.ssh_config, e.g.:

Host example.com
Host vps
  Hostname 72.xxx.xxx.xxx

As you can see, you can also use your per-user ssh_config to provide handy aliases for some hosts.

January 30, 2008

Installing the BSD’s: impressions

Filed under: vmware, bsd — Dan Muresan @ 8:52 am

I have recently had the chance to play again with the 3 BSD’s, under the excuse of testing the portability of a piece of software. I was able to install all of them in separate virtual machines under VMWare Server. I’m summarizing my impressions below. Chiefly, I was surprised at the number of (admittedly minor) annoyances that NetBSD gave me (I recall having no such problems a couple of years ago when I first tried this OS).

FreeBSD:

  • Most desktop-friendly, easiest to install
  • Issues with XFree86 to Xorg transition still cause problems in 6.3 (something the Linux distros have long ago solved)

NetBSD:

  • Less friendly installer compared to FreeBSD, but still quite useable
  • Make sure to install the text and man sets
  • pkg_add doesn’t quite work with some mirrors (e.g. ftp.at.netbsd.org) because it insists on using a complex wildcard pattern to look for packages (package-*.t[bg]z). Some FTP servers don’t support this pattern, and return no results. Bottom line: pkg_add bash sometimes fails, but pkg_add bash-3.2.25 always works.
  • IPv6 seems to be sadly enabled in most applications and for the most part causes only delays and error messages; most annoyingly so with pkg_add
  • Furthermore, the community seems to think IPv6 is the best thing since sliced bread. Search for “disable ipv6 netbsd”; you will find mostly unanswered forum messages, or when they are answered, the answer is along the lines “why do you need to do this”.
  • Under VMWare, you must wait until NetBSD finishes booting up before you can press CTRL+ALT to release the mouse (and do other things in the host OS); otherwise, the console dies.
  • Seems easiest to port on an embedded system (and runs on the largest number of platforms)

OpenBSD:

  • Somewhat resembles NetBSD
  • Least friendly installer (”dumb terminal” style, not even curses-enhanced)
  • You must install the xbase set (even if you’re not planning on using X) or else most packages won’t install later (including bash)
  • Default GCC version is the oldest (3.3.5); newer versions are available separately, but seem to come with different features (e.g. the ProPolice stack protector is not enabled)
  • Large number of security features that I did not try out

November 30, 2007

Chaining XPath queries in Mozilla

Filed under: xpath — Dan Muresan @ 4:52 am

If you search for xpath and mozilla, you will find a lot of pages telling you how to do a single query (I personally learned from Mark Pilgrim’s excellent Dive into Greasemonkey). What you will not find, though, is how to further refine the xpath results by chaining a second xpath query. For instance, suppose you have selected some rows out of a table:

var query = '//table [@id eq "scores"]/tr [@class eq "oddRow"]';
var results = document.evaluate (query, document,
  XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);

… and now you want to replace all text within bold tags with links (say a link whose target depends on the first cell of the row). How do you get to those bold tags? Some people assume that replacing the context item argument of the evaluate call is enough:

var row = results.snapshotItem (i);
var bold = document.evaluate ('//b', row, ...)

and are surprised when this returns all bold tags within the entire document. The above expression doesn’t even use the value of the context item. You must instead use a relative location path:

var bold = document.evaluate ('b', row, ...)

Not a big deal, but I’ve seen this question being asked a few times.

Finally, here are some utility functions to help write xpath queries faster:

function xpath_raw (query, ctx, type) {
  type = type || XPathResult.ORDERED_NODE_SNAPSHOT_TYPE;
  ctx = ctx || document;
  // HTML pages are usually small; no point in using iterators
  // or UNORDERED results
  return document.evaluate(query, ctx, null, type, null);
}

// returns an array of DOM nodes
function xpath (query, ctx, type) {
  var res = xpath_raw (query, ctx, type);
  var l = [], i;
  for (i = 0; i != res.snapshotLength; i++)
    l.push (res.snapshotItem (i));
  return l;
}
// returns the first DOM node in the result set
function xpath_single (query, ctx) {
  var res = xpath_raw (query, ctx);
  return res.snapshotItem (0);
}

October 28, 2007

Mass-generating random file names

Filed under: linux, bluetooth — Dan Muresan @ 11:20 pm

After setting up Bluetooth on my phone and laptop, I was faced with another problem: the phone saves images using filenames of the form ImageXXX.jpg — which is OK the first time, but tends to conflict with older files later on as the “XXX” counter restarts from 0. One may think that “mktemp” is a solution, but unfortunately that command won’t let us choose the extenion of the created file. Instead, on Ubuntu and Debian, use “tempfile”:

cd /mnt/Memory card/Images/
for f in *; do mv $f `tempfile -d $DEST_DIR --suffix=”.jpg”`; done

tempfile will create unique file names in the destination directory and avoid race conditions at the same time (not really an issue in this case, but good to know.)

October 24, 2007

Bluetooth in Ubuntu, the CLI way

Filed under: linux, bluetooth — Dan Muresan @ 11:12 pm

Note: updated for Hardy (2008-06-09)

Note: for the impatient, just use the script at the end of this post.

I recently received a Nokia 6300 as a birthday present. After playing with the cool 2-megapixel camera I wanted to save some of the pictures and clips to my laptop. The most convenient (and cheapest) way to network the phone and laptop is via Bluetooth.

Not knowing a thing about Bluetooth, I set out on a quest to learn more about this protocol — more specifically, how to use it in Ubuntu. This is where my problems started: nowadays everybody assumes that you run either Gnome or KDE, and most tutorials I found on the topic have a Window-esque technical level (run this from the menu, click that in the dialog). Only I don’t use a desktop manager at all: I run fluxbox, and to compound that “crime”, I turn off most “standard” services (dbus, hal, NetworkManager and whatnot).

Well, it turns out that to get Bluetooth, you need to start hcid, and this in turn absolutely, positively requires dbus:

service dbus start
hcid -s

At this point you can check your Bluetooth interface and scan for other devices:

laptop:~# hcitool dev
Devices:
	hci0	00:1C:26:F4:AF:C4
laptop:~# hcitool scan
Scanning …
	00:1D:98:54:A2:CB	Dan Nokia 6300

After some failed experiments, I learned that I need to configure a PIN agent. Bluetooth uses a PIN code as a crude form of authentication. When either party attempts a connection, the phone prompts for a PIN. hcid on the laptop also needs a PIN, but since we’re outside GNOME land, the default agent (which pops a dialog to ask for a PIN) doesn’t work. A PIN agent is actually any program that outputs a string PIN: plus the actual PIN code. The simplest agent is something like

#!/bin/sh
exec /bin/echo PIN:0000

Save this to /usr/local/bin/pin-agent, then simply type passkey-agent –default pin-agent &, which will inform hcid of the agent. Now we’re ready to connect to the phone, after apt-get install-ing the very cool obexfs package:

laptop:~# obexfs -b 00:1D:98:54:A2:CB -B 10 /mnt
laptop:~# ls /mnt
Graphics  Memory card  Received files  Themes  Video clips
Images    Music files  Recordings      Tones

where the address is the one reported by hcitool scan earlier. Presto, the phone’s clips and images are under /mnt! I was a little bummed that the contacts were NOT there, but I’ll figure that out some other time.

To save time, here’s a script you can use to automate the process:

#!/bin/sh
# Don’t forget to set up pin-agent
/etc/init.d/dbus start
hcid -s
# if you have no passkey-agent, see below
passkey-agent –default pin-agent &
# replace with your own address below
obexfs -b 00:1D:98:54:A2:CB -B 10 /mnt

Update: on Hardy (and possibly even earlier), passkey-agent is no longer installed by default. A simple hack (which, alas, may stop working later on) is to cd to /var/lib/bluetooth, mkdir -p a directory with the same name as your computer’s Bluetooth address (hcitool dev shows it), and in that directory create a file called pincodes. The pincodes file must contain one or more lines in the format

00:1D:98:54:A2:CB 0000

The first field must be the phone’s Bluetooth address (not the computer’s address). This is not the same as the directory name!

September 10, 2007

CPSCM: interfacing Javascript and Scheme

Filed under: scheme, cpscm — Dan Muresan @ 6:52 pm

Calling Javascript from Scheme just got easier in CPSCM:

(define v 10)
(define (f x) (+ x v))
(%cpscm:native "alert (" v ")")
;; Can pass Scheme variables and computations
(%cpscm:native "alert (" (f 5) ")")

All string arguments to %cpscm:native are copied verbatim to the output. Inner computations are still compiled as any normal Scheme code, and their results are passed to the native call via temporary variables. Each native call must correspond to a single JS statement.

The old method was to provide a CPS wrapper with the correct mangled name, e.g. to create a function callable as (fun 1 2) from Scheme:

var cpscmjsfun = cpscm__cpswrap (
  function fun (x, y) { return x + y; }
);

This still works, of course (as demonstrated in the DHTML bubble-sort example), but the new method adds convenience.

The main reason for native is the anticipated Emacs Lisp backend: users will surely want to call a myriad of elisp functions from Scheme, and writing a CPS stub for each of them would be impractical.

Note: the new code is in SVN, but the I haven’t updated the online compiler demo webapp yet.

August 27, 2007

More on R6RS ratification

Filed under: scheme — Dan Muresan @ 10:24 pm

Given the preliminary results, it looks like R6RS will pass with a 66% margin (I predicted 70% a while ago). Strangely, official results haven’t been announced yet, in violation of the announced schedule. More strangely, everyone seems to be quiet on this delay; I suspect that the nays are hoping for some last-minute deliverance handed down from the Steering Committee, but I’m surprised that the ayes aren’t becoming impatient at this point.

The comments section provides a glimpse into the disastrous effects of a biased electoral process that requires justification from dissenters, but not from approvers. While most of the nays provide a detailed analysis of the draft (usually acknowledging its virtues where applicable), the “yes” camp, where it bothers at all to comment, seems to employ a pretty lax standard (including a few pearls that I won’t quote in order not to offend the authors).

One comment that I found sad, yet funny states that “…Scheme needs a splash of ‘worse is better‘ to move the language standard forward” — coming from a voter from New Jersey, no less.

On a personal note, I almost messed up my ballot. I sent my vote just before the deadline from my registered voter address, but I forgot to change the default email-address field of the ballot. My vote was rejected; I sent a corrected ballot, but only after the deadline, so it wouldn’t count. I e-mailed Alan Bawden, who mentioned that “I did fix a lot of people’s ballots in trivial ways… [including]… unbalanced parentheses, but I drew the line at actually altering people’s claimed identities” (the email-address field apparently taking precedence over the originating address in the email envelope). Luckily, my vote was accepted after a short back-and-forth.

Update: the Steering Committee has ratified R5.97RS.

August 7, 2007

Emacs Lisp vs. Scheme: scoping and globals

Filed under: scheme, cpscm, emacs — Dan Muresan @ 3:46 am

I’ve been considering an elisp back-end for CPSCM (so that we can program Emacs in R5RS Scheme). I thought the lack of lexical scoping would prove a major stumbling block, but in the end it turns out that Elisp will be somewhat easier to support than Common Lisp. Here are the twists and turns (to evaluate Elisp code, go to the *scratch* buffer, paste the code and type C-x C-e):

  • Elisp has dynamic scope by default:
    (defun f () y)
    (let ((y 10)) (f))  ;; 10
    ;; lambda arguments are also dynamic
    (funcall (lambda (y) (f)) 11)  ;; 11
    
  • However, with (require 'cl) you get access to the (lexical-let …) macro, which does exactly what the name says (there is also a lexical-let*)
  • Using lexical-let, one can easily define lexical-lambda — here’s a simple version (optimized for minimal line lengths, not Lisp-ness)
    (defmacro lexical-lambda (args &rest body)
      (lexical-let* ((r '&rest) (g (lambda (x) (if (eq x r) x (gensym))))
                     (gvars (mapcar g args))
                     (bnd (mapcar* #'list args gvars)))
        `(lambda ,gvars
           (lexical-let ,(delete-if (lambda (b) (eq (car b) r)) bnd)
             ,@body))))

OK, so we’ve played catch up with Common Lisp and managed to work around dynamic scoping; here’s the beautiful part:

  • Elisp has sane(r) globals (from a Schemer’s POV, at least)

To those who haven’t bashed their heads against this problem, Common Lisp’s “normal” way of declaring globals (defvar / defparameter) makes variables “pervasively special” (i.e. dynamic) — meaning that

(defvar myvar 10)
(defun (f) myvar)
(defun g () (let ((myvar 1)) (f)))
(g)  ;; => 1, not 10

This is not such a problem for Lisp, but what with Scheme being a Lisp-1, translation of global functions is problematic:

(define (f x) x)  ;; Scheme
;; Lisp translation -- broken
(defvar f (lambda (x) x))

There’s another standard-compliant way to simulate globals in Lisp (using symbol macros — search comp.lang.lisp for deflex); however this method requires you to define each global before referencing it, which would preclude mutually-recursive global functions:

(deflex f (lambda (x) (funcall g (- x 1))))  ;; broken: g undefined
(deflex g (lambda (x) (if (> x 0) (funcall f x) 0)))

There are other, more convoluted ways to implement “non-special” globals that have elicited endless (and inconclusive, as far as I could tell) threads on comp.lang.lisp, e.g. using (locally (declare (special myvar))). Finally, in many Lisp's one can simply use (setq myvar …) and at most get a warning, but this is not standards-compliant.

As luck would have it, setq globals work in Elisp too, and the manual seems to indicate that this is intended semantics, not accident. So this will save me the pain of working around Common Lisp’s “special” variable rules (I’ve never found a satisfactory solution), which is why I’m happy about Elisp.

Of course there are areas that need work, e.g. an easier “FFI” to access Elisp functions from Scheme (currently, one has to define a CPS-style wrapper in the back-end with the proper mangled name to make a function callable from CPSCM). But I find the prospect of programming Emacs in Scheme a pretty good motivation…

Next Page »

[ Powered by WordPress ]