Omnigia

May 19, 2008

Debian / Ubuntu packaging: Zorba XQuery

Filed under: linux, c, debian, xquery, zorba — Dan Muresan @ 1:57 pm

Today I uploaded Ubuntu source and binary (Gutsy and Hardy) packages for Zorba, the new C++ streaming XQuery processor. The Ubuntu PPA system (Personal Package Archives) is a great service; without it, you’d need to host an APT repository in order to conveniently distribute packages that are not (yet) part of Debian or Ubuntu (especially since a Debian source package is actually three files).

In fact, my source package works in Debian unstable too; as there is no custom Debian Sid APT repository (Ubuntu PPA only serves Ubuntu distros), here’s what you need to do to build and install it:

  • dget the .dsc file (which pulls the original tarball and a .diff.gz as well)
  • run pbuilder zorbaxquery_0.9.1-3.dsc (apt-get install and set up pbuilder if you don’t have it)
  • retrieve the .deb’s from /var/cache/pbuilder/results/

It would be really nice if someone set up a PPA-like service for Debian, at least for repositories of source packages. I realize that setting up a cluster of build boxes is possible only with someone like Canonical behind. But the required storage for source packages could be quite small: if the *.orig.tar.gz “link” would dynamically retrieve an archive hosted elsewhere (a webapp could do this, trading space for bandwidth), such repositories could be quite compact (the .dsc and .diff.gz files are usually tiny). Alternatively, this scheme might work with a modified apt that could recognize HTTP redirects.

April 30, 2008

gdb: examining complex c++ objects

Filed under: linux, c, gdb — Dan Muresan @ 8:25 am

I’ve been doing quite a bit of C++ programming (and, alas, debugging) for a project lately. One endless source of annoyance in C++ (at least in Linux) is the impedance mismatch between the compiler (gcc) and the debugger (gdb). C++ is notoriously hard to compile (and even just parse). gdb does a bit of name-demangling, but quickly finds itself out of its depth for complex C++ features (like heavy template usage). This is, after all, an old problem — even with C programs, debugging macro-ladden code is painful.

But I’m not going to get into the details of that; today I’m going to show you how to deal with a lesser annoyance, namely examining STL objects. For example, if you use the gdb’s standard print (or p) command, strings look like a mess, and long ones are truncated:

#include 
#include 
#include 
using namespace std;

int main () {
  string s = “”;
  for (int i = 0; i < 5; i++)
    s += "This is a very, very long line.\n";
  s = s + s;
  cout << s;
  return 0;
}
~$ gdb testprog
(gdb) b 12
Breakpoint 1 at 0×8048c18: file testprog.cc, line 12.
(gdb) r
Breakpoint 1, main () at x.cc:12
12	  cout << s;
(gdb) p s
$1 = {static npos = 4294967295,
  _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> =
    {<No data fields>}, <No data fields>},
    _M_p = 0x804b3ac "This is a very, very long line.\nTh"...}}

(ok, I had to truncate the string manually here for readability purposes — but the exact value of the maximum width isn’t the issue here.) A much better way to examine strings is to use gdb’s printf command on the appropriate member of the STL object:

(gdb) printf "%sn" s._M_dataplus

This displays the actual string, without encoding newlines as \n. An even better way is to define a printstr command that you can reuse in future gdb sessions; create or edit the file ~/.gdbinit and add the following snippet:

define printstr
  printf "`%s'n", $arg0._M_dataplus._M_p
end

This will allow you to simply say printstr s whenever you need to examine a string. Of course, this definition relies upon GCC’s internal representation of a std::string, which may change from time to time.

After developing this gdb macro, I discovered Dan Marinescu’s excellent STL Views gdb scripts, which adds support for examining vectors, maps, sets (and, yes, strings). The ideea is the same. If you spend any significant time inside gdb, this is an invaluable tool.

It’s probably a good ideea to take this further and create similar printer functions for all complex (and frequently examined) classes in C++ projects. GDB’s user-defined commands are extensively documented in the manual. You don’t need to put such commands in ~/.gdbinit; you can create a separate script and load it using source scriptname.gdb when needed.

February 19, 2008

Fixing SSH tab completion

Filed under: linux — Dan Muresan @ 10:42 am

While everyone is familiar with bash’s TAB completion for paths and filenames, fewer people know that since bash 2.04, TAB can complete arguments in many more contexts thanks to a feature called programmable bash completion. The default completions handle mount, cvs, ant, ssh and many others; it’s also possible to program your own extensions for whichever command seems to stress your typing.

For me, ssh completion (which completes based on known host names, and consequently learns knew hosts as you ssh into them) has been one of the most useful plug-ins. Unfortunately, it has stopped working a while ago (Ubuntu Breezy to be more precise). I have recently discovered the real cause: as a security feature, ssh hashes host names instead of saving them directly into the known hosts file. This feature is meant to prevent worms from learning host names and spreading to them via ssh — except there aren’t many Linux worms around now or in the foreseeable near future. You can disable hashing by adding

HashKnownHosts no

into /etc/ssh_config, or alternatively, you can list some frequently used hosts into your ~/.ssh_config, e.g.:

Host example.com
Host vps
  Hostname 72.xxx.xxx.xxx

As you can see, you can also use your per-user ssh_config to provide handy aliases for some hosts.

October 28, 2007

Mass-generating random file names

Filed under: linux, bluetooth — Dan Muresan @ 11:20 pm

After setting up Bluetooth on my phone and laptop, I was faced with another problem: the phone saves images using filenames of the form ImageXXX.jpg — which is OK the first time, but tends to conflict with older files later on as the “XXX” counter restarts from 0. One may think that “mktemp” is a solution, but unfortunately that command won’t let us choose the extenion of the created file. Instead, on Ubuntu and Debian, use “tempfile”:

cd /mnt/Memory card/Images/
for f in *; do mv $f `tempfile -d $DEST_DIR --suffix=”.jpg”`; done

tempfile will create unique file names in the destination directory and avoid race conditions at the same time (not really an issue in this case, but good to know.)

October 24, 2007

Bluetooth in Ubuntu, the CLI way

Filed under: linux, bluetooth — Dan Muresan @ 11:12 pm

Note: updated for Hardy (2008-06-09)

Note: for the impatient, just use the script at the end of this post.

I recently received a Nokia 6300 as a birthday present. After playing with the cool 2-megapixel camera I wanted to save some of the pictures and clips to my laptop. The most convenient (and cheapest) way to network the phone and laptop is via Bluetooth.

Not knowing a thing about Bluetooth, I set out on a quest to learn more about this protocol — more specifically, how to use it in Ubuntu. This is where my problems started: nowadays everybody assumes that you run either Gnome or KDE, and most tutorials I found on the topic have a Window-esque technical level (run this from the menu, click that in the dialog). Only I don’t use a desktop manager at all: I run fluxbox, and to compound that “crime”, I turn off most “standard” services (dbus, hal, NetworkManager and whatnot).

Well, it turns out that to get Bluetooth, you need to start hcid, and this in turn absolutely, positively requires dbus:

service dbus start
hcid -s

At this point you can check your Bluetooth interface and scan for other devices:

laptop:~# hcitool dev
Devices:
	hci0	00:1C:26:F4:AF:C4
laptop:~# hcitool scan
Scanning …
	00:1D:98:54:A2:CB	Dan Nokia 6300

After some failed experiments, I learned that I need to configure a PIN agent. Bluetooth uses a PIN code as a crude form of authentication. When either party attempts a connection, the phone prompts for a PIN. hcid on the laptop also needs a PIN, but since we’re outside GNOME land, the default agent (which pops a dialog to ask for a PIN) doesn’t work. A PIN agent is actually any program that outputs a string PIN: plus the actual PIN code. The simplest agent is something like

#!/bin/sh
exec /bin/echo PIN:0000

Save this to /usr/local/bin/pin-agent, then simply type passkey-agent –default pin-agent &, which will inform hcid of the agent. Now we’re ready to connect to the phone, after apt-get install-ing the very cool obexfs package:

laptop:~# obexfs -b 00:1D:98:54:A2:CB -B 10 /mnt
laptop:~# ls /mnt
Graphics  Memory card  Received files  Themes  Video clips
Images    Music files  Recordings      Tones

where the address is the one reported by hcitool scan earlier. Presto, the phone’s clips and images are under /mnt! I was a little bummed that the contacts were NOT there, but I’ll figure that out some other time.

To save time, here’s a script you can use to automate the process:

#!/bin/sh
# Don’t forget to set up pin-agent
/etc/init.d/dbus start
hcid -s
# if you have no passkey-agent, see below
passkey-agent –default pin-agent &
# replace with your own address below
obexfs -b 00:1D:98:54:A2:CB -B 10 /mnt

Update: on Hardy (and possibly even earlier), passkey-agent is no longer installed by default. A simple hack (which, alas, may stop working later on) is to cd to /var/lib/bluetooth, mkdir -p a directory with the same name as your computer’s Bluetooth address (hcitool dev shows it), and in that directory create a file called pincodes. The pincodes file must contain one or more lines in the format

00:1D:98:54:A2:CB 0000

The first field must be the phone’s Bluetooth address (not the computer’s address). This is not the same as the directory name!

June 6, 2007

BurryFS: a file system written in Scheme

Filed under: scheme, linux — Dan Muresan @ 10:27 pm

I’ve released BurryFS, a file system based on Fuse and implemented in Chicken Scheme. BurryFS interacts with Fuse (the userspace filesystem API — merged into the Linux kernel since 2.6.14) to organize Digg content as a file system. Since the Fuse API relies on callbacks to deliver file system requests, and Scheme functions cannot serve as C callbacks, I have written a simple inversion-of-control layer that serializes Fuse requests over an internal socket and waits for replies from Scheme. At the other end, Scheme sits in an event loop, unpacking requests, reading information via the Digg API and sending replies. Since Chicken implements cooperative (lightweight) threads, complete with TCP support, BurryFS performance should be high even with multiple parallel requests.

For more information (and downloads), see the BurryFS homepage.

April 15, 2007

Fixing the Courier and Exim SSL certificates

Filed under: imap, linux — Dan Muresan @ 12:07 am

Most hosting accounts come with cPanel, and by implication Exim and Courier under the hood. Some people access their mail using the cPanel webmail interface (usually via https://example.com:2096), but if you need to send more than the occasional e-mail, you probably want to set up Outlook or Thunderbird to connect to the IMAP server.

Sometimes, the hosting company won’t have a canonical host name and matching SSL certificate for your domain, which will lead to endless security warnings in Thunderbird. If you’ve got shared hosting, there’s not much you can do (short of opening a support ticker and hoping for the best), but if you are a VPS customer, here’s how to fix your problem: first, edit /usr/lib/courier-imap/etc/imapd.cnf (in particular, set the correct hostname in the CN=… line). Then, run Courier’s mkimapdcert. This will generate the file /usr/lib/courier-imap/share/imapd.pem, which combines a key and certificate and is used by the Courier IMAP server. Next, copy and paste the RSA private key (including the delimiter lines) from the PEM file to /etc/exim.key, and similarly the certificate (the second section in the PEM file) to /etc/exim.crt.

When you start Thunderbird, it will complain that it can’t verify the certificate (to avoid this you’d have to pay a Certificate Authority like Verisign or Thawte, but we’re not doing that today). Choose to accept the certificate permanently. Voilà, no more warnings.

March 16, 2007

Ubuntu: make the world a better place by holding users hostages?

Filed under: linux, vmware — Dan Muresan @ 1:31 pm

Note: to the many people who just want to fix their problems and don’t care about politics — scroll to the end of this post.

As I was having trouble getting the VMWare MUI to work on Ubuntu, I came upon a bugzilla thread that solved my original problems, but made me very concerned about the Ubuntu developer team. The discussion highlights serious problems with their mentality, priorities, and attitude.

The controversy centers around the default Bourne shell, /bin/sh, which executes scripts in Linux (expert readers may skip this paragraph and the next two). For as long as anyone remembers, Linux distros have provided GNU Bash, an “embrace and extend” version of the original sh (the behaviour of sh is actually standardised in POSIX 1003.2). So /bin/sh was a symlink to /bin/bash — yet bash has extensions that would not work in a standards-compliant sh.

Now, scripts get to choose which shell they run under: the first line in any shell script must read something like #!/path/to/shell. But authors want their scripts to run on as many systems as possible, and the only cross-UNIX shell is /bin/sh — if you required /bin/bash, your script might not run on Solaris.

The problem is that many scripts only see actual usage on Linux, and since there has never really been a “bare” sh around, many scripts inadvertently rely on bash-only features. Everything worked though, and no one complained — until last year, that is.

In June 2006, Ubuntu registered a “feature specification” to use dash rather than bash as /bin/sh. Apparently, dash is faster and needs less memory, so mostly for these reasons the change was approved for Edgy. But dash also struggles to be “more catholic” than bash (though it has its sins too), so not every bash script runs on dash. Since Debian had previously conducted a shell script audit to rid packages of bash-isms, this wasn’t immediately noticed. However, outside packages were never reviewed, and complaints started piling up as new users (and upgraders) flocked to Ubuntu Edgy after its final release.

At that point, a previously obscure bug started gaining entries and visibility. The developers’ response was not what you’d expect for a distro backed by Canonical and self-styled “Linux for human beings”:

there are no plans to change the default configuration back to bash […] If vendors are distributing software that expects /bin/sh to be bash, then that software is broken. Please take it up with them.

So the users are supposed to notice the breakage, carefully debug the scripts to learn that the bug is due to bash-isms, complain to the authors and wait for the fix to arrive. If the users are not programmers, they’re out of luck. All this for software that ran just fine previously, mind you.

Of course, Ubuntu could easily fix this bug, retaining the speed improvement without inconveniencing users: revert sh to bash, and change Ubuntu packages to use dash. But I suppose that would mean conceding users were right from the start (and thus losing face).

Is this going to be a Jeff Johnson moment? What really scared me were comments by someone who claims to be a non-developer (strangely enough, the only non-developer to support the official policy):

Bashisms are bad. They need to be fixed […] Sometimes you have to do things the hard way to make the world a better place. I think we have begun down a slippery slope towards eradication of bashisms. They never would have gone away if it was just ‘the right thing to do’, but now if you write broken scripts you give up support for a major distro.

So, making the world a better place involves taking the userbase hostage, wasting thousands of people anywhere from 30 minutes to a couple of hours, and expecting them to do your bidding (i.e. persuade third parties to conform to some lousy standard that sported incompatible changes several times in a decade)? I really hope this is not what the developer team is secretly thinking, but the fact that there are exactly two replies from a single developer, in spite of the mounting frustration expressed in tens of comments, doesn’t look good. In any case, causing lost productivity that ranges somewhere into the hundreds of thousands of dollars is a remarkable accomplishment, only not one to be proud of.

Update: to those who just want to fix this problem without downgrading Ubuntu: either run dpkg-reconfigure dash or, more brutally,

ln -sf /bin/bash /bin/sh

February 8, 2007

VMWare: when two OSs access the same partition

Filed under: linux, vmware — Dan Muresan @ 7:42 am

Probably the most convenient way to run Windows under Linux is to start with a dual-boot setup, then create (in Linux) a VMWare Server virtual machine based on the physical Windows partition. This ensures that you don’t have re-install Windows and your favorite applications.

But with great convenience comes great danger. When you power on the virtual machine, it will boot into GRUB (or LILO) which will ask which OS you want to run. No problem you’ll say, select Windows, it’s just a small inconvenience. Until the day your fingers err. Or, if GRUB has a timeout, the day you run to get a cup of water and come back to witness Linux booting. That means that the virtual machine and the host OS are now accessing the same partitions simultaneously.

The various VMWare tutorials strongly caution you to avoid this situations, which will likely result in data loss. But maybe you are wondering just how bad things can go (at least I always have). Well, about a month ago, facing a complete Linux re-install, I found the perfect opportunity to experiment. I had two Linux partitions (a JFS root and an EXT3 volume). So I powered up the virtual machine into Linux, and let it run its course, after which I rebooted.

The results? Surprisingly, the root JFS partition came out from fsck unscratched. That’s right, there were no errors, and nothing in /lost+found. The EXT3 partition, by contrast, was destroyed beyond repair (it started with a bad superblock, and went downhill from there as I tried to recover). Unphased, I decided to try again (after reformatting my EXT3 partition). The same thing happened. I have no ideea why, and I wouldn’t necessarily conclude that JFS is safer, but if you ever have the chance (or misfortune) to experiment, let me know how it goes…

And now, on to something more useful: how do you prevent such disasters? The answer is to force the VMWare partition to boot from a virtual floppy disk that makes the correct OS choice automatically (it could be GRUB with a single-item boot menu, or an NTLDR-based solution). Scott Bronson’s VMWare tutorial shows how to do this. Unfortunately, his method is rather inconvenient, requiring several reboots. So what follows is a simpler solution that replaces steps 3-10 from his Set up the Boot Disk section:

dd if=/dev/zero of=bootdisk.img bs=1k count=512
mke2fs -F bootdisk.img
mount -oloop bootdisk.img /mnt
mkdir -p /mnt/boot/grub
cp /boot/grub/stage[12] /mnt/boot/grub/

cat >/mnt/boot/grub/grub.conf <<EOF
timeout=3
title=Windows
root            (hd0,0)
chainloader     +1
makeactive
EOF

umount /mnt

grub --device-map=/dev/null <<EOF
device (fd0) bootdisk.img
root (fd0)
setup (fd0)
quit
EOF

The rest of Scott’s tutorial still applies — in particular, setting up different hardware profiles is important. How important? I’ll let you know next time I’m stuck with a complete Windows reinstall…

January 30, 2007

Compressed filesystem using SquashFS and AutoFS

Filed under: linux — Dan Muresan @ 9:42 am

When installing a modern Linux distribution on older computers, one problem you may face is the lack of disk space. I ran into this last week, while helping a friend install Ubuntu on an antique laptop with a 2G hard drive. The obvious starting point is to begin with a minimalist installation — Ubuntu Alternate CD (my choice), Arch Linux, or a few others. The good news is that your system doesn’t have to stay minimalistic if you know how to tailor the distribution.

One way to save space is to use data compression. It’s possible to keep parts of the filesystem compressed on disk and have Linux decompress them on the fly when they’re needed. This ideea is as old as Stacker / DoubleSpace, but for Linux we need to do more work, as there’s no stable read-write compressed filesystem as of this writing (though you may want to watch Johan Parent’s compFUSEd as it matures).

First, install the tools: squashfs, a compressed file system that yields better performance than the traditional cramfs, and autofs, to mount and unmount compressed directories automatically. Next, if you’ve never used a compressed filesystem, it helps to play with squashfs a bit:

# log in as root or type "sudo bash"
mksquashfs /tmp dummy.squashfs
mount -o loop dummy.squashfs /mnt
ls /mnt           # should be identical to /tmp
touch /mnt/x  # won't work, squashfs is read-only

This example creates a squash file system (in the file dummy.squashfs) that mirrors the contents of /tmp and mounts it (using loop, since it’s an ordinary file and not a block device) on /mnt. As the last command demonstrates, you can’t write in a squashfs, so you’ll want to compress directories that are normally not modified (so /tmp would actually be a bad choice, and so would be any user home directory, /var etc.)

Now, to work — let’s set autofs up (this only needs to be done once:)

cd /etc
echo '/var/autofs/squash /etc/auto.z --timeout=300' >>auto.master
echo '* -fstype=squashfs,loop :/opt/squashfs/&.squashfs' >>auto.z
/etc/init.d/autofs restart

The first line tells autofs to read /etc/auto.z (and to unmount auto-mounted directories 300 seconds after they are unused for 300 seconds); the second one says that whenever someone accesses /var/autofs/squash/DIR (where DIR is an arbitrary name), autofs should try to mount /opt/squashfs/DIR.squashfs automatically.

Next, set your sights on a large, read-only directory — say /usr/lib/mozilla-thunderbird. Here’s the plan:

  1. convert relative symlinks: for f in `find /usr/lib/mozilla-thunderbird -type l`; do t=`readlink -f $f`; rm $f; ln -s $t $f; done
  2. create a compressed filesystem: mksquashfs /usr/lib/mozilla-thunderbird /opt/squashfs/mozilla-thunderbird-lib.squash
  3. remove the original directory: rm -rf /usr/lib/mozilla-thunderbird
  4. replace the directory with a symbolic link: ln -s /var/autofs/squash/mozilla-thunderbird-lib /usr/lib/mozilla-thunderbird

You may wonder why the first step is necessary. The answer is that /usr/lib/mozilla-thunderbird contains some relative links (things like ../share/icons) that would break when the directory is relocated to /var/autofs/squash. So we use find to locate symlinks, readlink to read their target, and then rewrite these links.

That’s it. Whenever you access the compressed directory, it will be automounted:

ls /usr/lib/mozilla-thunderbird
mount

This method does have one disadvantage: if you ever upgrade thunderbird, dpkg will follow the compressed directory symlink and try to write inside it (which will fail). You should remove the /usr/lib/mozilla-thunderbird symlink prior to an upgrade (and, presumably, re-compress once the upgrade completes)

Next Page »

[ Powered by WordPress ]