Unwrapping control scripts part II: restoring the complete environment (Tomcat)

In the previous episode we dealt with restoring only a few variables, though there was the complication of two levels of indirection (service apache2 start and apache2ctl). When placing Tomcat under the control of daemontools, there is a single indirection (service tomcat7 start calls but the environment has more complex variable values and includes running under a different UNIX uid / gid as well.

The first step is to set up the fake init.d script and replace the simple /usr/bin/env environment dumper with somehting less easily fooled. env -0 is not subject to variable values that contain newlines or "=". We also record the real and effective user and group id's:

# rewrite /etc/init.d/ script
cp "/etc/init.d/${NAME}" "/tmp/initd_${NAME}_fake"
perl -pi.bak -e 's@(CATALINA_SH=).*@$1"/tmp/catalina_fake"@;s@(CATALINA_PID=")/var/run/@$1/tmp/@' \
# create stub
cat <<"EOF" | perl -pe "s@NAME@$NAME@g" >/tmp/catalina_fake
# real / effective uid, real / effective gid + sgid's
perl -le '$, = "\n"; print $<, $>, 0+$(, $)'>"/tmp/NAME_id.txt"
# args, including program name, $0
perl -e '$\ = chr (0); print $0; print while defined ($_ = shift)' "\$@" >"/tmp/NAME_args.txt"
# finally env
/usr/bin/env -0 >"/tmp/NAME_env.txt"
# execute fake init.d script
chmod a+x /tmp/catalina_fake "/tmp/initd_${NAME}_fake"
"/tmp/initd_${NAME}_fake" start >/dev/null 2>&1

We saved catalina's arguments (including the full path to the real to a tomcat7_args file for demonstration purposes; we will actually overwrite this file, because unlike the init.d script, we want to invoke run (which runs in the foreground), not start (which daemonizes). The last part of the “unwrapped” script extracts catalina's path from the saved CLI arguments and calls it in the appropriate environment, with the help of a little utility (withidenvargs) that I will describe in my next post:

CATALINA=$(perl -0e '$_ = <>; chomp; print' "/tmp/${NAME}_args.txt")
printf "%s\0%s\0" "$CATALINA" run >"/tmp/${NAME}_args.txt"
exec withidenvargs "/tmp/${NAME}_id.txt" "/tmp/${NAME}_env.txt" "/tmp/${NAME}_args.txt"

Unwrapping control scripts: Apache under daemontools

In Debian, if you start apache The Right Way, you're actually going through two indirection layers: /etc/init.d/apache2 start sets up some environment variables (e.g. by reading /etc/default/apache2) and eventually runs apache2ctl start — which again sets up some stuff and eventually runs apache2. You can't really safely skip either of them.

This poses some problems in case you want to run apache2 non-daemonized (in the foreground), say under the watchful eye of a process supervisor like daemontools (or runit, or s6, or any of the other clones / enhancements). We all know that apache never crashes and never segfaults, so there's no need to auto-restart it, but still.

We want to run apache2 in the exact environment that /etc/init.d/apache2 start and apache2ctl start create. You could stare at the scripts and extract environment variables by hand, but this is time-consuming and error-prone. The elegant way to replicate the actions of the scripts is to replace the final call to apache2 with a stub that saves the complete environment, and then exec /usr/sbin/apache2 in that environment from the daemontools run script. To achieve this, one can rewrite apache2ctl (call it apache2ctl_fake) to invoke our stub, then rewrite /etc/init.d/apache2 to invoke apache2ctl_fake instead of the real apache2ctl. The stub itself can simply use env to dump the environment into a file. Putting all this together, we get

exec 2>&1
# rewrite /etc/init.d/ script
cp "/etc/init.d/${NAME}" "/tmp/initd_${NAME}_fake"
perl -pi.bak -e \
  's@APACHE2CTL( start)@ENV /tmp/apache2ctl_fake$1@' \
# rewrite apache2ctl
{ echo '#!/bin/sh'; echo "APACHE_HTTPD=/tmp/${NAME}_fake";
  cat `which apache2ctl`; } >"/tmp/${NAME}ctl_fake"
# create stub
cat <<EOF >"/tmp/${NAME}_fake"
/usr/bin/env >"/tmp/${NAME}_env.txt"
# execute fake init.d script
chmod a+x "/tmp/${NAME}_fake" "/tmp/${NAME}ctl_fake"
chmod a+x "/tmp/initd_${NAME}_fake"
"/tmp/initd_${NAME}_fake" start >/dev/null 2>&1
# prefix all encironment assignments with export
perl -ni.bak -e 's/^/export /; print unless /^export PWD=/' \
# load environment
. "/tmp/${NAME}_env.txt"
# call the real apache2
exec /usr/sbin/apache2 -k start -DNO_DETACH -DNO_DAEMONIZE

Note that the final crude “environment reload” trick only works for environment variables with no spaces in their values, because env does not quote assignments and/or escape quotes, i.e. it doesn't output VAR="value with \"nasty\" stuff". For more thorough handling one could generate output in the style of daemontools' envdir and use that tool to exec apache2.

Debian / Ubuntu packaging: Zorba XQuery

Today I uploaded Ubuntu source and binary (Gutsy and Hardy) packages for Zorba, the new C++ streaming XQuery processor. The Ubuntu PPA system (Personal Package Archives) is a great service; without it, you'd need to host an APT repository in order to conveniently distribute packages that are not (yet) part of Debian or Ubuntu (especially since a Debian source package is actually three files).

In fact, my source package works in Debian unstable too; as there is no custom Debian Sid APT repository (Ubuntu PPA only serves Ubuntu distros), here's what you need to do to build and install it:

  • dget the .dsc file (which pulls the original tarball and a .diff.gz as well)
  • run pbuilder zorbaxquery_0.9.1-3.dsc (apt-get install and set up pbuilder if you don't have it)
  • retrieve the .deb's from /var/cache/pbuilder/results/

It would be really nice if someone set up a PPA-like service for Debian, at least for repositories of source packages. I realize that setting up a cluster of build boxes is possible only with someone like Canonical behind. But the required storage for source packages could be quite small: if the *.orig.tar.gz "link" would dynamically retrieve an archive hosted elsewhere (a webapp could do this, trading space for bandwidth), such repositories could be quite compact (the .dsc and .diff.gz files are usually tiny). Alternatively, this scheme might work with a modified apt that could recognize HTTP redirects.