Wednesday, May 28. 2014
I've been at the 2014 Extended (formerly European) Semantic Web Conference (ESWC) in Anissaras, Greece for four days now. My reason for attending was to present my paper Seeding structured data by default in open source library systems (presentation) (paper).
It has been fantastic. As a librarian attending a conference dominated by computer science academics, I was met with genuine interest in my work on expressing schema.org metadata via RDFa in library catalogue web pages. Despite my relative inexperience in this milieu of PhD candidates and faculty, I was able to participate productively in the opening workshops and tutorials, and enjoyed the keynotes and panel sessions (I'll happily admit, though, that I was out of my depth in many of the main track sessions). And I was welcomed into (and enjoyed) social events and situations, something I tend to worry about when joining a new community.
After many discussions, and particularly after attending the cleverly named SALAD2014 (Services and Applications over Linked APIs and Data) workshop and the keynote on day one by Stefan Staab, my hope for many of the promises of the Semantic Web proper has been rekindled. The awareness amongst attendees of the need for pragmatism--support for developers and actual results--beyond just the interesting academic research.
These are good people, and this is a good community. Conference: recommended!
Oh, and there's this...
Wednesday, April 2. 2014
Last week I had the fantastic experience of returning to the code4lib conference for the first time since 2008, and as a speaker to boot.
The title of my talk was Structured data NOW: seeding schema.org in library systems. I had given two talks the prior week on a substantially similar subject (about teaching Koha, Evergreen, and VuFind how to express schema.org structured data via RDFa), but all three conferences had very different audiences. I felt great about my talks at LibTechConf and the Evergreen International Conference, but those were one hour long and 45 minutes long respectively. code4lib, on the other hand, schedules 20 minute slots; it is a veritable crucible for speakers. I remixed and rewrote my code4lib talk obsessively leading up to the conference, and ultimately ended up adding content to my overall message, which was obviously the wrong direction to take things... but before this audience of my peers, I felt an absolute need to explain why I had chosen to spend much of the past year and a half focused on RDFa and schema.org. And that ultimately led to having to cut a significant amount out of the actual delivery, which meant that the audience didn't get the takeaway message that I actually wanted to deliver. One peer, in fact, described it as "a good refresher on microdata" which was almost exactly what I had wanted to avoid doing (microdata vs. RDFa aside) for this audience!
All caught up? Good! Now let's pretend that I had about ten more minutes; here's roughly what I wanted to impart:
Structured library information: given that schema.org offers the Library type, and library systems often contain information such as the hours of operation, contact information, physical address, and branch relationships, we can teach our library systems to express that as structured data. And good news, Evergreen (as of the 2.6 release) will do exactly that! So if you remember all the way back to the start of the presentation where I was pointing at various map services that had differing levels of knowledge about our libraries often requiring different social media accounts, publishing your data out in an openly accessible, standard format should make it possible for those map services (including OpenStreetMap) to do a better job of reflecting our presence in the world.
Thought experiment: Now that we're publishing our holdings in a commonly understood Offer format, and linking those holdings to the library that holds them, and (in the case of Evergreen) providing information about those libraries, when can we stop batch uploading MARC at irregular intervals just to create union catalogs? In fact, wouldn't we be able to build ILL systems that can do a much better, more competitive job once we're making this information openly available on the web?
Sitemaps: Of course, to tell search engines and crawlers what pages are of interest and when they have been updated, you have to offer a sitemap. Fortunately, Koha, VuFind, and Evergreen (to a lesser extent) all support generation of sitemaps today.
Quick union catalogues: As a proof of concept, I proved that we can build union catalogues on the backs of existing general search engines by creating a Google Custom Search Engine (CSE) that tied together the holdings of two different VuFind instances along with an Evergreen instance under a single search box. It is as ugly as sin, but it took me all of about ten minutes to cobble together; Google had already crawled all of the pages, so I just had to tell it what hostnames and URL patterns I cared about. The CSE even gives you some limited support for directly querying the underlying structured data. Later on, Sean Aery from Duke gave a lightning talk that showed off how they had taken exactly this approach to provide a search interface for their finding aids and digital collections and made it beautiful!
Quick union catalogues: in progress: As a firm believer in the importance of decentralization, I pointed at a simple in-progress Python script that would crawl sitemaps and extract structured data from all of the indexed pages. My intention was to provide a complete indexed solution with a simple web frontend, but I got a bit bogged down in first updating the Fedora packages for several of the dependencies, then tackling some bugs in the upstream libraries themselves. More to be done here!
Hmm. Well maybe I didn't miss conveying as much as I had feared. On the bright side, there was a great deal of interest in the SchemaBibEx "best practices and recommendations" documentation that I had promised we were working on... and today Richard Wallis described some of his work in this area. So that's a good thing. And even if some of the audience walked away from my talk with just an introduction to RDFa and schema.org, that put them in an extremely good position to be able to enjoy and understand Sean's subsequent lightning talk.
Oh, and my admission of being a semantic web dropout (due to the complexity of content negotiation and heterogeneous vocabularies and billions of triples and RDF/XML) ended up being a perfect setup for the immediately following talk Next Generation Catalogue - RDF as a Basis for New Services by Anne-Lena Westrum and Asgeir Rekkavik from the Oslo Public Library, who basically said "Semantic web? Oh yeah, we can totally do that!" and proceeded to show their MARC2RDF and RDF2MARC workflows. Very cool stuff (and delightful scheduling by the conference program committee!)
Saturday, March 22. 2014
Yesterday at the 2014 Evergreen International Conference I presented Structured library data: holdings, libraries, and beyond--a talk about the work I've done specifically with Evergreen and making some of the connections with Koha and VuFind's capabilities. Lots of attendees seemed happy with the talk and the direction that we're going with Evergreen, and have hope for the future relevance of our libraries' resources within normal search engines, as well as all of the possibilities opened up by exposing this open data about our libraries (locations, hours, branch relationships, contact informatoin) and their resources in a much more consumable form.
There was so much energy in the room, I could have talked for another hour... I love the Evergreen community!
Thursday, March 20. 2014
Two things of note:
It has been fun and invigorating to hear the responses of those who are seeing the results and direction of this work for the first time! More thoughts to come...
This work is licensed under a Creative Commons Attribution-Share Alike 2.5 Canada License.