WordPress to Drupal 7 / PostgreSQL migration

As mentioned in the previous post, I decided to consolidate my data in a single database (PostgreSQL) and attempt to migrate the WordPress content to Drupal. Here's how it went (and how you could go about it).

First of all, I exported the WordPress data as a WXR (WordPress eXtented RSS) file. I then unpacked Drupal 7.10 under my web root folder.

To make Drupal truly useful as a blog platform, I needed to install a (fairly large) number of "modules" (the equivalent of WordPress plugins). The simplest way to install Drupal add-ons is to configure FTP access to its folder. Anonymous FTP will not work; Drupal needs an FTP user that can cd to its installation path. Luckily the FTP server need only listen on the loopback interface (127.0.0.1) — there's no need to open up a writable directory to the entire Internet.

The first modules to install were Token, Pathauto, Migrate, Migrate Extras and WordPress Migrate. The important ones are Pathauto, which will help preserve friendly URLs (instead of Drupal's cryptic internal URL's based on numerical node IDs), and of course WordPress Migrate.

After installing the modules, I enabled them and tried to use Administration > Content > WordPress Migration, only to run into a couple of SQL errors in the Migrate module. The first one is due to attempting to insert a float into an integer column (which MySQL unsurprisingly tolerates), and is fixed by a simple patch:

--- ./sites/all/modules/migrate/includes/base.inc	2011-12-19 05:59:27.000000000 -0600
+++ ./sites/all/modules/migrate/includes/base.inc.orig	2011-09-10 13:37:15.000000000 -0500
@@ -734,3 +734,3 @@
                        'process_type' => $newStatus,
-                       'starttime' => microtime(TRUE) * 1000,
+                       'starttime' => round (microtime(TRUE) * 1000),
                        'initialHighwater' => $this->getHighwater(),
@@ -767,3 +767,3 @@
           ->fields(array(
-            'endtime' => microtime(TRUE) * 1000,
+            'endtime' => round (microtime(TRUE) * 1000),
             'finalhighwater' => $this->getHighwater(),

The second error was caused by the non-portable MySQL syntax INSERT IGNORE in migrate/plugins/destinations/comment.inc; simply dropping the IGNORE fixed the problem.

After WordPress Import completed successfully, I was left with a site that contained all the original data, but looked nothing like the original blog. First of all, Drupal does not support a WordPress Categories menu out of the box. I had to install the Taxonomy Menu module for that. Then, for friendly taxonomy URL's, I configured PathAuto to generate links from the "vocabulary" (the old WordPress categories). I also installed Global Redirect, since otherwise feeds would still have internal Drupal names (and subscribers would be left in the dark). One nice thing was that Digg, Reddit and other social news per-post buttons could be automatically added via the Service Links module.

This might seem like an easy enough process, but unfortunately gathering the required information is hampered by the sparsity of the documentation, the multiplicity of Drupal versions, and the evolving nature of the multiple modules required to tweak Drupal. For example PathAuto used to support automatically renaming taxonomy feeds, but later moved this feature to GlobalRedirect — so many posts I found by around the web were unhelpful (if not misleading). This kind of thing seems to be part of the experience of learning Drupal.

The final step was to create a Drupal theme that mimicked the appearance of the WordPress theme. I will talk about that in a separate post.

Tags: 

Migrating content to PostgreSQL

In a quest to reduce reduce duplicate functionality on my VPS (and gain some RAM space for better uses), I have decided to consolidate my database services on PostgreSQL. The problem: WordPress is very MySQL-centric — it does not currently support, and probably never will support PostgreSQL (or other databases). This is understandable: MySQL's "be liberal in what you accept" philosophy encourages less-than-portable SQL to proliferate — and once you have a successful MySQL-bound codebase, it's hard to justify the cost of rewriting it for portability.

As it turns out, there is a plugin (PG4WP) that enables WordPress to run over PostgreSQL. It does so by replacing the wp-includes/wp-db.php database abstraction layer with a PG-specific one — which means the author must revise it for every new WordPress version. It currently supports WordPress up to 3.2.1, the previous WP release as of this writing.

As with any piece of software developed by a single author and used by a relatively limited audience, there are no guarantees about the future of the project. But even if that could be overlooked, there's a bigger problem: it's doubtful that other plugins will work with a PostgreSQL back-end, as few if any authors will test outside of MySQL.

I have therefore started to look for other CMS options that are more PostgreSQL-friendly (at least in their descriptions), while also having a large community and being under active development. After looking around I have decided to try Drupal, a popular PHP system that claims to support several databases.

I will talk about the technical details in another post; to make a long story short: it took more time and effort than I expected, but it worked. I was able to closely reproduce the original WordPress site in Drupal running over PostgreSQL.

Tags: 

sintvert, a real-time wave-to-MIDI server for known waveforms

It's nice to be able to input note events from a hardware MIDI-enabled keyboard, rather than from a "virtual" mouse-based (or computer keyboard-based) one like jack-keyboard or vmpk. However, with software support, even non-MIDI (audio-only) sources can be translated to MIDI.

I have used Paul Brossier's excellent aubio package with mixed success in the past. Pitch recognition is a complex problem, and for MIDI purposes, even a 3% error in the detected frequency can swing the output by a semitone. Since most voices likely to be available as audio inputs (e.g. from an old keyboard, or from a real instrument) will have some amount of LFO, it's hard to get an error-free reading, at least without using a large delay line in the detector.

One way of easing this problem is to train the pitch detector on the entire range of possible audio signals to be recognized. For a 3-octave instrument, for example, this means training on just 36 waveforms. The detector can thus extract features from all potential inputs during training, and then work reliably even with a small buffer (which translates to a shorter delay between the input signal and the detector output). Of course, since the training signal presumably needs to include at least a half-period of the waveform, low notes in the input range will require greater latency (e.g. a C2 is about 65.4 Hz, or 15.3 / 2 = 7.15 ms half-period).

This is the approach I have taken in sintvert. It's a Jack application written in C, so it should be fairly easy to compile and install on a modern distro.

Tags: 

Memtest86+ from a USB stick, the easy way

My MSI Wind U100 has finally arrived (a few days after Christmas when I was expecting it, but still in 2008 luckily) and it has been exhibiting several strange Windows crashes. Since the U100 version I ordered comes with a "bonus" 1024M of RAM, which (by my understanding) are not installed by the OEM, but by the online store that sells the netbook, I naturally suspected memory problems and reached for Memtest86+.

Unfortunately Memtest86+ does not run from Windows or Linux as "normal" software does, because it needs to replace whatever OS exists and trash the memory as part of its job. For most folks the easiest way to run Memtest86+ is to burn the distributed ISO image and boot from the CD (or boot it off a floppy for the few that still have such a peripheral). But netbooks don't have CD/DVD drives. The only workable option then is to run Memtest86 from a bootable USB disk.

There are many tutorials on how to create bootable USB disks using things like syslinux, isolinux, makebootfat, but most of these are boring to even skim, let alone put in practice. After looking around for a while I found a simpler solution:

  • Download unetbootin
  • Run it and install a FreeDos image on the USB stick (unetbootin has a specific menu option for this — it will download the FreeDos setup files for you automatically)
  • Download Memtest86+ — use the "Pre-Compiled EXE file for USB Key (Pure DOS)" version
  • Unpack the zip file (it should contain a single executable) to the root directory of your USB disk.
  • Ensure legacy USB support is enabled in BIOS
  • Boot off the USB stick and choose one of the LiveCD options (not "Install"!)
  • At the DOS prompt, change drives by typing "c:" (this will take you to the USB disk) and run the Memtest86+ executable (e.g. mt211.exe).

Memtest86+ will then run on your system for an hour or so (hopefully telling you nothing is wrong). At the end, you will need to reboot the computer, because (as mentioned above) Memtest86 completely replaces any running OS. But you weren't likely to stick around in FreeDOS any longer anyway.

This still takes several steps, but it's light, non-error-prone GUI work and doesn't require handling a lot of disparate components like other methods. You can use this method for running other DOS-only executables (e.g. legacy software or BIOS flashing programs.)

Tags: 

Catch up with the audio and music tools on Ubuntu Hardy

Updates: per readers' request, I have added Hardy backports to all packages mentioned in this article (except Pulseaudio where the appropriate section isn't clear) in my PPA. I have also added newer ALSA packages.

Many Ubuntu users have chosen to stick out with Hardy, the Long Term Support (LTS) release, supported until April 2011 (or 2013 for the server edition). But support mostly means security fixes, not necessarily updated versions of popular software; the Hardy Backports project doesn't necessarily keep pace with our favorite packages. In my case, I wanted the latest version of Rosegarden (1.7.2), the audio / MIDI / score editor and sequencer.

Fortunately, there is a way to enjoy fresh software while still postponing the dreaded dist-upgrade (or clean reinstall) marathon for as long as Hardy remains supported. The solution is to download source packages from later distributions, compile them on your system, and install the resulting binary packages.

You can either add entries to /etc/apt/sources.list and then run apt-get source ... (disruptive because it enables "future" versions for all packages), or use dget manually:

dget http://ftp.debian.org/debian/pool/main/p/pulseaudio/pulseaudio_0.9.10-3.dsc
dpkg-source -x pulseaudio_0.9.10-3.dsc

Either way, to build the package you then

cd pulseaudio-*
debuild -uc -us -b
cd ..; dpkg -i # your packages here

(if the build fails you may need to install extra -dev libraries). Some good places to scout for updated versions include:

In all cases, you can go directly to a package by appending its name to the above URL's; and once you got to a package page, you will find a link to the .dsc file you need to dget (as explained above) on the right side. Keep in mind that not all packages will compile on Hardy — you will have to experiment.

Back to audio packages — the reason I mentioned the Debian Sid pulseaudio package is because (unlike Ubuntu) it includes module-jack-sink, which allows Pulseaudio to run on top of the low-latency jackd daemon. This means you can have Jack and some music applications (like a soft synth and Rosegarden) running perfectly, and still be able watch YouTube videos (without having to kill and later restart Jackd). As with any Pulseaudio on Hardy setup, you will still need libflashsupport in order for Mozilla to be able to connect to Pulseaudio (same with Opera).

I also recommend the following: jackd 0.1.116 (again from Debian Sid); Fluidsynth 1.0.8 (from Intrepid); qjackctl 0.3.4 (from Ubuntu Jaunty); and finally, rosegarden 1.7.2, for which I couldn't find any distribution to leech from — use the version in my PPA (obtained by updating the Intrepid package to the latest upstream release). The PPA page contains the instructions for enabling it in APT.

Tags: 

Resuming a file copy operation

If you ever need to interrupt (and then resume) a slow cp operation (e.g. from a USB stick or over NFS), you will appreciate cURL’s support for the file:// scheme:

curl -C - -O file:///media/memstick/file.avi

will resume the copy (and display a nice progress report as well).

Here are some alternatives that don't quite work:

  1. The more famous (and widely-installed) wget doesn't grok file:// URIs
  2. rsync reads both the source and destination files (in order to compute checksums), so there is no speedup
  3. Mr. Hartvig's clever recp script uses dd with a block size of 1, which is slow and CPU-intensive

Of course, you can use curl instead of cp to begin with, if you like the progress bar and don't mind the extra keystrokes.

Tags: 

X, with and without an external monitor

As a laptop user, I often find myself switching between LCD-only, external-monitor-only, and dual-screen setups. Read below for a summary of how to achieve this flexibility under X (more specifically Xorg), both statically (via multiple configuaration files, requiring X restarts) and dynamically (while X is running) — but also some of the gotchas you will run into.

  1. Some static configurations
    You can have multiple xorg.conf configuration files, but they all must reside in /etc. To start Xorg with a specific configuration file, use, for example:
    startx -- -config xorg.conf.external

    If you've already started X, you can also start a distinct X session by specifying a new display number:

    startx -- :1 -config xorg.conf.external
    • A configuration that disables the laptop screen: in the Device section of xorg.conf.*, add
              Option "monitor-LVDS" "LVDS"
      

      Also add a Monitor section:

      Section "Monitor"
              Identifier "LVDS"
              Option "Ignore" "True"
      EndSection
    • The same effect can be achieved using TwinView for NVIDIA cards:
      Section "Screen"
          Option         "TwinView" "True"
          Option         "MetaModes" "nvidia-auto-select, off"
      EndSection
      
    • To enable both screens, you can use a vanilla xorg.conf (as generated for example by sudo Xorg -configure); xrandr can then configure dual-head, as described in the next section. However, I have noticed that under this setup X disables the XVideo support (meaning, for example, a slower mplayer); I don't know if there's a way to avoid this problem.
  2. Dynamic configuration
    #disable laptop screen
    xrandr --output LVDS --off
    # switch back to laptop screen
    xrandr --output VGA --off
    xrandr --output LVDS --auto
    # dual-head (laptop + external)
    xrandr --output VGA --above LVDS
    # --left-of, --below etc. also work

    For the last xrandr command (dual-head), your combined external + laptop virtual screen resolution must not exceed the virtual desktop size. If not specified in xorg.conf, the X server pre-computes it at startup as the highest resolution of all monitors connected to your computer (i.e. if you start with your external monitor disconnected, the laptop's resolution; if the external monitor is connected at start-up, it will most likely dictate the virtual). Therefore you will most likely want to specify the virtual desktop size:

    Section Screen
      Subsection Display
        Depth 32
        Virtual 2048 2048
      EndSubsection
    EndSection

    However, as a further twist, some cards lose graphics acceleration capabilities when the virtual size is too high. If you notice your browser scrolling a page slower than normal, for example, this may be to blame.

Tags: 

Debian / Ubuntu packaging: Zorba XQuery

Today I uploaded Ubuntu source and binary (Gutsy and Hardy) packages for Zorba, the new C++ streaming XQuery processor. The Ubuntu PPA system (Personal Package Archives) is a great service; without it, you'd need to host an APT repository in order to conveniently distribute packages that are not (yet) part of Debian or Ubuntu (especially since a Debian source package is actually three files).

In fact, my source package works in Debian unstable too; as there is no custom Debian Sid APT repository (Ubuntu PPA only serves Ubuntu distros), here's what you need to do to build and install it:

  • dget the .dsc file (which pulls the original tarball and a .diff.gz as well)
  • run pbuilder zorbaxquery_0.9.1-3.dsc (apt-get install and set up pbuilder if you don't have it)
  • retrieve the .deb's from /var/cache/pbuilder/results/

It would be really nice if someone set up a PPA-like service for Debian, at least for repositories of source packages. I realize that setting up a cluster of build boxes is possible only with someone like Canonical behind. But the required storage for source packages could be quite small: if the *.orig.tar.gz "link" would dynamically retrieve an archive hosted elsewhere (a webapp could do this, trading space for bandwidth), such repositories could be quite compact (the .dsc and .diff.gz files are usually tiny). Alternatively, this scheme might work with a modified apt that could recognize HTTP redirects.

Pages

Subscribe to Omnigia: Scheme, web applications, tech RSS