Triumph of the tiny brain: Dan vs. Drupal / Panels

Posted on Thu 18 October 2012 in misc

A while ago I inherited responsibility for a Drupal 6 instance and a rather out-of-date server. (You know it's not good when your production operating system is so old that it is no longer getting security updates).

I'm not a Drupal person. I dabbled with Drupal years and years ago when I was heavily into PHP, but it never stuck with me. Every time I poked around at the database schema, with serialized objects stuck inside columns, I found something else that I wanted to work on instead. Thus, inheriting a Drupal instance wasn't something I had been looking forward to. As this production server was running a number of different services that were in use by our library, I went through a number of trial runs to ensure that the base packages wouldn't introduce regressions or outages. Fast-forward past a reasonably successful early-morning upgrade from Debian Lenny to Squeeze and I was able to start looking at addressing the Drupal instance that was also approximately 18 months out of date.

Initially, after I worked out the how-to of Drupal upgrades (in short: upgrade just Drupal core, then upgrade the modules), I thought all was well. I even got over the hump of realizing that our instance had had all of the modules dumped into Drupal's core directory, rather than sites/all/modules, and (even more impressively) got over the problem that the core bluemarine them had been hacked directly rather than having been separated out into a new custom theme. After working through those learning pains, I realized that somewhere in all of the Drupal and module upgrades, that something got "more secure" and started truncating IMG links to files with spaces in them at the first space. So "foo%20bar.jpg" was becoming "foo.jpg" and we were getting 404s everywhere.

Did I mention that I didn't notice this until I upgraded our production instance? Oh yes, I went through iteration after iteration of upgrades on the test server, and dutifully fixed up the problems that I found in the subset of content that I was testing against. I discovered and fixed problems like the production server content linking directly to the test server (slight copy-and-paste errors on the part of the content creators, I suppose). But I didn't notice all of the 404s, because who uploads images with spaces in their filename?

Turns out, everyone else in my library does that. Of course! And from what I was able to piece together via Google and browsing drupal.org, there was supposed to be some sanitization of the incoming filenames so that spaces would be normalized, etc. But either that wasn't introduced until well after our content had been created, or my predecessor had lightly hacked one of the modules, or Drupal itself, and hadn't bothered to use a source code repository to track those customizations. So, realizing that I needed to make some bulk changes, I went at it with a two-step plan:

  1. Create symbolic links for both the truncated filename and the spaces-normalized-to-underscores filenames. Creating symlinks for the truncated filenames would fix the 404s immediately, at the cost of some clash in the intended targets; there were plenty of Foo illustration.JPG and Foo info.JPG pairs of files, but like the Highlander, there can be only on Foo.JPG.
  2. Munge the database entries so that all of those now apparently insecure %20-containing filenames would become underscores.

If you're a Drupal user or a Drupal with Panels module user, you might know that the database schema suffers from some fairly horrible tricks being played on it. In this case, the Panels module creates a panels_pane table with a configuration TEXT column. Based on the name alone, it might seem odd that column is used to store the HTML content of the corresponding panel. Even odder to me is that this is not just a TEXT column, it's a column that expects a very particular structure - something like:

a:5:{s:11:"admin_title";s:5:"RACER";s:5:"title";s:0:"";s:4:"body";s:639:"<p><img width="225" height [...]}

Ah, nothing like storing an object within a single database column. Of particular interest was the result that I had when I tested updating the column value with a basic "replace(configuration, '%20', '_')" - the panel showed only n/a, presumably because the size (defined by the s properties in the object) for the "body" text property was no longer a match. That would be an instance of http://drupal.org/node/926448 - so okay, clearly I had to change tactics and update the entire object.

I tried quickly finding the Drupal way to do this: clearly there's an API and there must be some simple way to retrieve an object, change it's values, and update it so that the serialized object gets stored in the database and Drupal is happy. However, I couldn't find a simple tutorial, and trying #drupal on Freenode was unfortunately fruitless as well (although some people did try to suggest running REPLACE() at the database level, that was nice but they didn't recognize that that would actually damage things significantly).

So... out came the Perl, and here's what I hacked together:

#!/bin/perluse strict;use warnings;foreach (<DATA>) {    chomp();    my $i = 0;    my $body = 0;    my @fixed;    my @row = split /\t/;    my $pid = $row[1];    my $configuration = $row[0];    my @chunks = split /";s:/, $configuration;    foreach my $chunk (@chunks) {        if (!$i++) {            push @fixed, $chunk;            next;        }        if ($chunk =~ m/"body/) {            $body = 1;            push @fixed, $chunk;            next;        }        if ($body) {            my ($length, $content) = $chunk =~ m/^(\d+):"(.+)$/;          for (my $j = 0; $j < 50; $j++) {            $content =~ s{(/pictures/[^\./]+?)%20}{$1_}g;         }            $content =~ s{%20}{+}g;            $length = length($content);            $chunk = "$length:\"$content";            $body = 0;        }        push @fixed, $chunk;    }    print 'UPDATE panels_pane SET configuration = $ashes$' .         join('";s:', @fixed) . '$ashes$' . " WHERE pid = $pid;\n";}__DATA__

Against the trusty database (I ? PostgreSQL!), I ran COPY (SELECT configuration, pid FROM panels_pane WHERE configuration ~ '%20') TO 'conf_pids.out';, then slapped the Perl code on top and generated a load of UPDATE statements. It's far from my best Perl code, but it worked and once I gave up on doing things the Drupal way I was able to put it together in a handful of minutes. I now have a functional Drupal 6 instance again, updated such that there are no known security vulnerabilities with either core or the modules we're using, and there are no broken image links.

And now I need to begin working towards either grokking Drupal, or finding a content management system that my tiny brain can comprehend, because I don't want to have to go through these kinds of contortions again with future upgrades... Suggestions welcome!