RDFa and schema.org all the library things

Posted on Fri 30 August 2013 in Libraries

TLDR: The Evergreen and Koha integrated library systems now express their record details in the schema.org vocabulary out of the box using RDFa.

Individual holdings are expressed as Offer instances per the W3C Schema Bib Extension community group proposal to parallel commercial sales offers. And I have published a branch to give the same capabilities to the VuFind discovery layer, as well.

In the spring of 2012, I took my first steps in the structured data world by teaching Evergreen 2.2 how to express some record details in schema.org. It was a small step towards taking the machine-readable data that we had made useful to humans on the record detail catalogue page and marking it up so that it was once again machine readable. At that time, Evergreen only knew how to map MARC data to two schema.org types (Book and MusicRecording--which should have been MusicAlbum, but I eventually fixed that) and a handful of attributes: name, ISBN, publisher, publication date, author, contributor, and keywords. Pretty barebones, but a start nonetheless.

I used the HTML5 microdata approach because I was new to structured data and microdata was what was demonstrated in all of the schema.org examples, so it seemed like the obvious choice. Over the last year, however, I realized that RDFa is a W3C standard for accomplishing the same goals as microdata, bolstered by an open community standards-making process, and featuring the ability to mix in properties and types from multiple vocabularies. I touched on this in my Evergreen 2013 conference presentation: Structured data: making metadata matter for machines. While RDFa Lite is extremely easy to get started with, I have been diving deeper into RDFa proper to make use of some of the more advanced properties, such as @about to work around unwanted chaining introduced by @href attributes.

Over the last few weeks, I was able to concentrate on improving the schema.org mapping for Evergreen--introducing holdings as instances of the http://schema.org/Offer class, providing much more granular author and contributor data--and cut over to RDFa. While the tools at RDFa Tools were quite useful for debugging my efforts, I also have to thank the denizens of the #rdfa IRC channel (and Manu Sporny in particular) for patiently helping me understand some of my rookie mistakes. Ben Shum also kept me honest by patiently testing multiple iterations of my branches with the Google Rich Snippets tool and reporting any issues that he encountered; this led to my realization that using @resource and @about were necessary in some contexts.

Once I had worked out a decent mapping in Evergreen (a library system I have been contributing to for over six years now), I decided to tackle the VuFind discovery layer. VuFind uses a straightforward template system, and I was able to put together a branch that integrated schema.org as RDFa (details at VuFind bug 425), building on Eoghan Ó Carragáin's initial efforts. Once again I included holdings-as-Offers, as the Evergreen driver for VuFind made that easy enough to test. As part of my work, I contributed some enhancements for the Evergreen driver that have already been integrated into VuFind. The initial reception from the VuFind community was positive, although my branch arrived too late for the VuFind 2.1 release; if all goes well, it will be integrated for the VuFind 2.2 release. In the mean time, sites running VuFind that want schema.org structured data can integrate my branch themselves--and please provide feedback!

As I was on a roll, I also opted to tackle the Koha integrated library system. With some initial pointers from Galen Charlton and Chris Cormack to the XSLT-based templating system that Koha uses, I was able to implement schema.org with holdings-as-Offers in a matter of hours for the first iteration. Jared Camins then worked patiently with me as I added small commits to address issues that came up on the Evergreen side, but in under a week from start to finish the branch was signed off, passed QA, and and pushed to master.

(It actually broke the build due to a coding violation--doh!--but that was quickly cleaned up.)

The upshot? We now have two library systems set to publish rich schema.org structured data--including holdings--in RDFa, out of the box by default, in their record detail pages on the Web, and a third system ready to go.

Let me simply say that I love the agility of open source software. So, for the future, I intend to tackle a few more library systems; digital repositories seem like they would be worthwhile targets. On that front, I have inquired on the DSpace developers' list about whether there is still interest in integrating schema.org (as had been expressed a year ago), but have not yet received a reply. Perhaps ArchivesSpace, or furthering the existing support on Islandora? Let me know if you're interested!