Briss, a PDF cropper and rearranger: 1/N-up layouts

I've been working on Briss2, a PDF cropper (and a fork of the original Briss project) that can rearrange two-column documents, trim excessive margins and other similar feats. The classic use case is converting two-column to single-column documents; this is a "1/2-up" conversion, i.e. the opposite of the familiar "two pages per sheet" (or 2-up) layout for printing. Cutting up columns, however, yields tall and narrow portrait documents that are even harder than normal portrains on many devices. That's why I favour a 1/4-up conversion (tear apart the columns, and also divide each column vertically), possibly followed by 2-up multiplexing (the end result being a landscape document with a single original column per page — the top half of the original on the left and the bottom half on the right). Another good layout is cutting a portrait document's pages into 3 landscape strips (e.g. for devices with low resolution).

The original Briss is unmaintained, so I created a fork on Github called briss2. Besides fixing a few annoying problems, this version adds tools for partitioning the page into crop rectangles more easily (with optional overlap — to handle split lines), as well as creating reproducible layouts. The current version scratches most of my itches (I also had an undo/redo implementation that was unfortunately lost in a crash) — which is why patches (or pull requests) are welcome.


Passphrase recovery from regex approximation

Due to the current memory-loss inducing holidays, you may end up forgetting your seldom-used (or recently-changed) login password, SSH or GPG passphrase. If you still have some recollection of what it looked like, one way back in is to generate a wordlist from a regular expression approximation, then feed it to a cracking tool like John the Ripper. Yes, this means cracking your own password.

Let's say your password was the hard for humans to remember Tr0ub4dor+3. But you don't remember the various capitalizations, which letter did you l33t, which punctuation mark and suffix you used in a feeble attempt to slow down a potential attacker. You can generate a plausible wordlist using regdlg, the regular expression grammar language dictionary generator

regldg -m 15 -us 255 '[tT]r[0o]ub[@a]d[0o]r.[0-9]!?' >/tmp/mywl.txt

Here -m 15 sets the maximum length, while -us 255 enables the regexp period to match all alphanumeric characters plus punctuation and symbols (the regexp universe). Then, download and install a “jumbo”-patched John the Ripper:

git clone
cd JohnTheRipper/src
make linux-x86-native
cd ../run

Now generate a password file using unshadow, ssh2john or gpg2john (you may want to delete irrelevant lines from the output)

./gpg2john ~/.gnupg/secring.gpg >/tmp/gpwd.txt

and crack the password:

./john --show --wordlist=/tmp/mywl.txt /tmp/gpwd.txt

There's usually no point in enabling the --rules john option (or writing custom John rules) as they don't deal with the kind of variation pertinent to approximate password recall.


  • Be sure to delete your cracking script (if you saved the above commands to one) and your .bash_history, or better yet change your passphrase after recovering it.
  • Of course, there's also the BOFH method of recalling passwords by recreating the scenario which made you dream them up in the first place, but it may not always applicable (or definitive in recovering all details).

Gstreamer video player with variable speed support: pb2player

I've recently had the chance to play with an older Nokia Maemo phone. Having a full Linux (Debian, no less) distro on a phone is quite a treat, which explains why N900 has somewhat of a cult following. While trying to watch some video lectures, I discovered that the default media player (which is GStreamer-based) doesn't have any variable speed support. On the other hand, the players that do (vlc / mplayer and their front-ends) don't necessarily have hardware acceleration, because only GStreamer can access the DSP. As is the case with many embedded platforms, Gstreamer enjoys special support.

As PyGST / PyGTK are friendly platforms, I ended up coding pb2player, a variable-speed Gstreamer video player that works both on the desktop and on Maemo devices. I had to restrict myself to Python 2.5, which in practice wasn't too bad: there's no str.format(), and I had to use optparse instead of argparse (which is aggravated by gstreamer's buggy takeover of optparse once pygst is imported — one has to do all CLI processing before the import). The only cross-platform issue I've encountered is the appearence of GTK FileChooserDialog, which leaves too little room for long file names and sometimes makes it difficult to navigate to the parent directory.

The player is basic, but quite useful and solid in my experience. It supports m3u playlists (a priority given the FileChooserDialog woes), faster / slower playback and random seeking within the current video. The variable speed playback is nice for skimming the boring parts of a clip. Pitch control (a la mplayer's scaletempo, so that the voices in your head-phones don't start squealing at high playback rates) would be even nicer.

If you're going to try out pb2player, I suggest grabbing one of the releases or the stable github pb2player v1.0.x branch. After finishing the first workable version I decided to test-drive Qt, which seems to have a much more usable file dialog on Maemo; Therefore I refactored the app according (loosely) to MVA in order to be able to keep both front-ends (in the short term, because we all know how bridging different philosophies works out in the end). That experiment is not yet finished.

Once downloaded, you can run pb2player from its source folder, or you can build a Debian package to be installed elsewhere (but not on Maemo — see below). I put the Debian instructions in a mydebian folder, as having an actual debian/ in your source tree makes the package "native Debian", which gets in the way of real packagers most of the time. The make debuild target creates a .deb in /tmp based on mydebian.

To build and install a Maemo-suitable .deb, two more steps are required:

  1. Maemo wants .desktop files in a non-standard location (/usr/share/applications/hildon), so mydebian/rules must patch the default $(desktopdir)
  2. What is now called python-gobject-2 on a Ubuntu Precise system used to be called python-gobject in the past (which now refers to version 3). This is a result of Debian's moving target naming policies. The PC-generated playbin2player requires python-gobject-2. As a workaround, we can build a dummy package using equivs that depends on pgo-1 and provides pgo-2.

I've uploaded debian/rules and equivs control file for building python-object-2 (as well as the two resulting .deb packages) to this thread, so I won't include the details here. Upload and install python-gobject-2_1.0_all.deb and playbin2player_1.0.1-1_all.deb to your Maemo device and you can run pb2player.

Prevent overheating (and thermal shutdowns) with cputhermalfreqd

For some of us, relief from noisy fans, overheating laptops and thermal shutdowns is a can of compressed air away. For the hardware-impaired, though, there's now cputhermalfreqd, which I've written over the summer as the prudence required to keep my laptop online (stop video player every so often, suspend for a while before burning a CD etc) began to take its toll on my patience.

cputhermalfreqd is a daemon that controls the CPU speed using the cpufreq drivers. Unlike other similar daemons, it is
focused on slowing down your CPU once the system gets too hot, thus preventing inopportune thermal shutdowns (and thermal
shutdowns are never opportune).

cputhermalfreqd starts with a list of temperature thresholds (e.g. 23 18 12) which represent degrees below critical temperature. At every iteration (you can specify how often the daemon checks the sensors), cputhermalfreqd checks all temperature sensors and spots the sensor that is closest to its critical temperature (e.g. one of the CPU sensors or the hard disk). That remaining “thermal headroom” is used to locate a particular interval between the initialization thresholds and dial down the CPU to the corresponding speed step (e.g. if the headroom is 20, that would be the second-highest speed available in the system in our example.)

Internally, the program discovers the available sensors by strace-ing the sensors command (from the lm-sensors package) as it boots up. It then regularly uses cpufreq-set to adjust the speed as necessary. I know, it's not necessarily classy, but it works and it's portable.

So, anyway, grab cputhermalfreqd from github. Or wait for the summer, if you're in the northern hemisphere and are currently using your laptop as a heating device...


Running a program with a specific environment, uid/gid and arguments: withidenvargs

In my Tomcat under daemontools hack, I alluded to withidenvargs, a small utility I have written to run a program in a fully-specified environment. This gives you more control than djb's envdir (no newline limitations or '\0' translation hacks in environment variables, plus the ability to specify uid/gid), though in principle I love djb's philosophy of using the filesystem as a parser. Unlike envdir, withidenvargs doesn't inherit the parent's exports.

withidenvargs takes 3 filenames as arguments

  1. an ids file (newline-separated list of real / effective uid, real gid, effective gid + sgid's — see below) as produced by perl -le '$, = "\n"; print $<, $>, 0+$(, $)'
  2. an environment file (null-separated list of assignments as produced by env -0)
  3. an arguments file (null-separated list of arguments starting with the program name, as in C's argv
#!/usr/bin/env perl
use strict; use warnings;
use File::Slurp;
scalar @ARGV == 3 || die;
my ($fid_name, $fenv_name, $farg_name) = @ARGV;
my @ids = read_file ($fid_name);
@ids || die $!; scalar @ids >= 4 || die;
map { chomp } @ids;
my ($uid, $euid, $gid, $egid) = @ids;
$/ = chr (0);
%ENV = ();
open my $fenv, "<", $fenv_name || die $!;
while (<$fenv>) {
  my ($k, $v) = split /=/, $_, 2; $ENV {$k} = $v;
  chdir ($v) if $k eq "PWD";
close ($fenv);
open my $farg, "<", $farg_name || die $!;
my @args = ();
while (<$farg>) { push @args, $_ }
close ($farg);
# handle empty sgid list -- setgroups ([])
$egid = "$egid $egid" unless $egid =~ / /;
if ($uid =~ /[^0-9]/) {  # translate to numeric id's if necessary
  ($uid, $euid) = (getpwnam ($uid), getpwnam ($euid));
  $gid = getgrnam ($gid);
  $egid = join " ", map { $_ = getgrnam ($_) } (split / /, $egid)
$( = "$gid"; $) = "$egid";
$< = $uid; $> = $euid;
#system ("id");  # check
exec @args;

Note: It is important to change the real and effective gid before the uid's, because otherwise the process may “loose root” and be unable to perform privilleged operations.

Also interesting is Perl's way of setting the supplementary group IDs. A process can have, in addition to its read and effective gid, several supplimentary gid's (sgid's); at system level they are controlled using the POSIX getgroups() and the non-POSIX setgroups(). A less-known form of $)-assignment accepts a list of gid's, the first of which signifies the egid, while the rest compose the sgid list. To specify an empty supplimentary list, however, we must repeat the egid (leaving out the second egid changes the meaning of the $)-assignment to egid-only, which would leave the inherited sgid's unchanged, possibly leaking root privilleges to the child process).

As a quirk, withidenvargs picks up the working directory from which it launches the program from the $PWD environment variable. This may or may not be what you want.

Unwrapping control scripts part II: restoring the complete environment (Tomcat)

In the previous episode we dealt with restoring only a few variables, though there was the complication of two levels of indirection (service apache2 start and apache2ctl). When placing Tomcat under the control of daemontools, there is a single indirection (service tomcat7 start calls but the environment has more complex variable values and includes running under a different UNIX uid / gid as well.

The first step is to set up the fake init.d script and replace the simple /usr/bin/env environment dumper with somehting less easily fooled. env -0 is not subject to variable values that contain newlines or "=". We also record the real and effective user and group id's:

# rewrite /etc/init.d/ script
cp "/etc/init.d/${NAME}" "/tmp/initd_${NAME}_fake"
perl -pi.bak -e 's@(CATALINA_SH=).*@$1"/tmp/catalina_fake"@;s@(CATALINA_PID=")/var/run/@$1/tmp/@' \
# create stub
cat <<"EOF" | perl -pe "s@NAME@$NAME@g" >/tmp/catalina_fake
# real / effective uid, real / effective gid + sgid's
perl -le '$, = "\n"; print $<, $>, 0+$(, $)'>"/tmp/NAME_id.txt"
# args, including program name, $0
perl -e '$\ = chr (0); print $0; print while defined ($_ = shift)' "\$@" >"/tmp/NAME_args.txt"
# finally env
/usr/bin/env -0 >"/tmp/NAME_env.txt"
# execute fake init.d script
chmod a+x /tmp/catalina_fake "/tmp/initd_${NAME}_fake"
"/tmp/initd_${NAME}_fake" start >/dev/null 2>&1

We saved catalina's arguments (including the full path to the real to a tomcat7_args file for demonstration purposes; we will actually overwrite this file, because unlike the init.d script, we want to invoke run (which runs in the foreground), not start (which daemonizes). The last part of the “unwrapped” script extracts catalina's path from the saved CLI arguments and calls it in the appropriate environment, with the help of a little utility (withidenvargs) that I will describe in my next post:

CATALINA=$(perl -0e '$_ = <>; chomp; print' "/tmp/${NAME}_args.txt")
printf "%s\0%s\0" "$CATALINA" run >"/tmp/${NAME}_args.txt"
exec withidenvargs "/tmp/${NAME}_id.txt" "/tmp/${NAME}_env.txt" "/tmp/${NAME}_args.txt"

Unwrapping control scripts: Apache under daemontools

In Debian, if you start apache The Right Way, you're actually going through two indirection layers: /etc/init.d/apache2 start sets up some environment variables (e.g. by reading /etc/default/apache2) and eventually runs apache2ctl start — which again sets up some stuff and eventually runs apache2. You can't really safely skip either of them.

This poses some problems in case you want to run apache2 non-daemonized (in the foreground), say under the watchful eye of a process supervisor like daemontools (or runit, or s6, or any of the other clones / enhancements). We all know that apache never crashes and never segfaults, so there's no need to auto-restart it, but still.

We want to run apache2 in the exact environment that /etc/init.d/apache2 start and apache2ctl start create. You could stare at the scripts and extract environment variables by hand, but this is time-consuming and error-prone. The elegant way to replicate the actions of the scripts is to replace the final call to apache2 with a stub that saves the complete environment, and then exec /usr/sbin/apache2 in that environment from the daemontools run script. To achieve this, one can rewrite apache2ctl (call it apache2ctl_fake) to invoke our stub, then rewrite /etc/init.d/apache2 to invoke apache2ctl_fake instead of the real apache2ctl. The stub itself can simply use env to dump the environment into a file. Putting all this together, we get

exec 2>&1
# rewrite /etc/init.d/ script
cp "/etc/init.d/${NAME}" "/tmp/initd_${NAME}_fake"
perl -pi.bak -e \
  's@APACHE2CTL( start)@ENV /tmp/apache2ctl_fake$1@' \
# rewrite apache2ctl
{ echo '#!/bin/sh'; echo "APACHE_HTTPD=/tmp/${NAME}_fake";
  cat `which apache2ctl`; } >"/tmp/${NAME}ctl_fake"
# create stub
cat <<EOF >"/tmp/${NAME}_fake"
/usr/bin/env >"/tmp/${NAME}_env.txt"
# execute fake init.d script
chmod a+x "/tmp/${NAME}_fake" "/tmp/${NAME}ctl_fake"
chmod a+x "/tmp/initd_${NAME}_fake"
"/tmp/initd_${NAME}_fake" start >/dev/null 2>&1
# prefix all encironment assignments with export
perl -ni.bak -e 's/^/export /; print unless /^export PWD=/' \
# load environment
. "/tmp/${NAME}_env.txt"
# call the real apache2
exec /usr/sbin/apache2 -k start -DNO_DETACH -DNO_DAEMONIZE

Note that the final crude “environment reload” trick only works for environment variables with no spaces in their values, because env does not quote assignments and/or escape quotes, i.e. it doesn't output VAR="value with \"nasty\" stuff". For more thorough handling one could generate output in the style of daemontools' envdir and use that tool to exec apache2.

No terse strtok() in Python

It turns out in Python there's no generator version of str.split() (which produces a list of substrings). Some recipes in this Stackoverflow answer, of which I suppose

def split_iter(string):
  return ( for x in re.finditer(r"[A-Za-z']+", string))

is "terse enough" (and much more functional, though it requires proper escaping of re-special characters). It's hard not to quip that C has had strtok (or, more reasonably, strtok_r) for a very long time.