Adding a new metadata format to Evergreen in a dozen lines of code

Posted on Mon 26 January 2009 in Libraries

Just like my last entry, this is a preview of one part of my upcoming session at the OLA SuperConference, Evergreen Exposed: Hacking the open source library system. We know from the last entry that Evergreen internally converts MARC21 to MODS to support item display; and in fact it also includes support for exposing records as OAI, RDF, SRW, and HTML. Today, we're going to be looking at adding support for an entirely new metadata format to Evergreen.

Back in November, 2008, George Duimovich requested "I would like to hear from anyone on the process for adding an additional supported format" in the specific context of the FGDC metadata format for digital geospatial data. George did a great thing to support his request and included links to the metadata format itself, along with a pointer to an XSLT stylesheet that the inestimable Terry Reese had written and published for converting MARC21 to FGDC XML. His request has been burning at the back of my mind since then, partially because I had quickly responded with the oh-so-helpful:

Assuming that we can get over the licensing hump, it should be a

relatively straightforward matter of dropping the transform into

Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm and

Open-ILS/src/perlmods/OpenILS/WWW/SuperCat/Feed.pm (using something

like MODS32 as a template).

Simple and straightforward, right? Well... yes and no. I had just gone through the process of adding MODS 3.2 support because I needed the more granular treatment of URLs to fix an item display problem, so I was pretty comfortable with the code at the time. After a few months, that familiarity goes away and one gets to go through the discovery process again. (Oh, and about a week after the MODS 3.2 support went in and Mike Rylander went the extra mile to update all of the indexes to use MODS 3.2, MODS 3.3 was released to the world. Sigh).

Without further ado, following are the diffs required to roughly support FGDC as a SuperCat format:

dbs@dbs-laptop:~/source/Evergreen-rel_1_4$ svn diff Open-ILS/src/perlmods/Index: Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm===================================================================--- Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm   (revision 11952)+++ Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm    (working copy)@@ -143,6 +143,18 @@    # and stash a transformer    $record_xslt{rss2}{xslt} = $_xslt->parse_stylesheet( $rss_xslt ); +   # parse the FGDC xslt ...+   my $fgdc_xslt = $_parser->parse_file(+       OpenSRF::Utils::SettingsClient+          ->new+           ->config_value( dirs => 'xsl' ).+        "/MARC21slim2FGDC.xsl"+  );+  # and stash a transformer+   $record_xslt{fgdc}{xslt} = $_xslt->parse_stylesheet( $fgdc_xslt );+  $record_xslt{fgdc}{docs} = 'http://www.fgdc.gov/metadata/csdgm/index_html';+ $record_xslt{fgdc}{schema_location} = 'http://www.fgdc.gov/metadata/fgdc-std-001-1998.xsd';+  register_record_transforms();     return 1;

If you're still with me after that whack of code, and you're counting, that's about 12 lines of code. Okay, I'm cheating - the diff doesn't include the MARC21 to FGDC stylesheet - for one thing, I'm still waiting to see a version of the stylesheet with a license attached to it. For another, do you _really_ want to see all that XSL? After you patch your copy of OpenILS::Application::SuperCat.pm, copy the MARC21 to FGDC stylesheet into /openils/var/xsl, and restart the Evergreen Perl services, you'll be able to take advantage of the new functionality. That's it!

What's going on in this code? This patch against Open-ILS/src/perlmods/OpenILS/Application/SuperCat.pm enables SuperCat (and therefore unAPI) support for the new format. We just add an entry to the hash of XSLT stylesheets that SuperCat knows about, and the rest is visible in URLs like:

So who cares about this? Well, George cares, and (I'm guessing wildly here), perhaps it's because his organization has tools that can import FGDC but that also want to maintain the data in their library catalogue because they love MARC. That might be sufficient reason. Another reasonable use case would be to use the FGDC transform to populate spatial data tables built on the geospatial extensions offered by PostGIS and index these for lightning-fast retrieval of maps and map data that cover a given range of coordinates.

I'm sure the same approach could be used for other specialized metadata formats. This is just one example of why I'm sold on Evergreen's capability as a platform for the future of our library.