Wednesday, April 2. 2014
Last week I had the fantastic experience of returning to the code4lib conference for the first time since 2008, and as a speaker to boot.
The title of my talk was Structured data NOW: seeding schema.org in library systems. I had given two talks the prior week on a substantially similar subject (about teaching Koha, Evergreen, and VuFind how to express schema.org structured data via RDFa), but all three conferences had very different audiences. I felt great about my talks at LibTechConf and the Evergreen International Conference, but those were one hour long and 45 minutes long respectively. code4lib, on the other hand, schedules 20 minute slots; it is a veritable crucible for speakers. I remixed and rewrote my code4lib talk obsessively leading up to the conference, and ultimately ended up adding content to my overall message, which was obviously the wrong direction to take things... but before this audience of my peers, I felt an absolute need to explain why I had chosen to spend much of the past year and a half focused on RDFa and schema.org. And that ultimately led to having to cut a significant amount out of the actual delivery, which meant that the audience didn't get the takeaway message that I actually wanted to deliver. One peer, in fact, described it as "a good refresher on microdata" which was almost exactly what I had wanted to avoid doing (microdata vs. RDFa aside) for this audience!
All caught up? Good! Now let's pretend that I had about ten more minutes; here's roughly what I wanted to impart:
Structured library information: given that schema.org offers the Library type, and library systems often contain information such as the hours of operation, contact information, physical address, and branch relationships, we can teach our library systems to express that as structured data. And good news, Evergreen (as of the 2.6 release) will do exactly that! So if you remember all the way back to the start of the presentation where I was pointing at various map services that had differing levels of knowledge about our libraries often requiring different social media accounts, publishing your data out in an openly accessible, standard format should make it possible for those map services (including OpenStreetMap) to do a better job of reflecting our presence in the world.
Thought experiment: Now that we're publishing our holdings in a commonly understood Offer format, and linking those holdings to the library that holds them, and (in the case of Evergreen) providing information about those libraries, when can we stop batch uploading MARC at irregular intervals just to create union catalogs? In fact, wouldn't we be able to build ILL systems that can do a much better, more competitive job once we're making this information openly available on the web?
Sitemaps: Of course, to tell search engines and crawlers what pages are of interest and when they have been updated, you have to offer a sitemap. Fortunately, Koha, VuFind, and Evergreen (to a lesser extent) all support generation of sitemaps today.
Quick union catalogues: As a proof of concept, I proved that we can build union catalogues on the backs of existing general search engines by creating a Google Custom Search Engine (CSE) that tied together the holdings of two different VuFind instances along with an Evergreen instance under a single search box. It is as ugly as sin, but it took me all of about ten minutes to cobble together; Google had already crawled all of the pages, so I just had to tell it what hostnames and URL patterns I cared about. The CSE even gives you some limited support for directly querying the underlying structured data. Later on, Sean Aery from Duke gave a lightning talk that showed off how they had taken exactly this approach to provide a search interface for their finding aids and digital collections and made it beautiful!
Quick union catalogues: in progress: As a firm believer in the importance of decentralization, I pointed at a simple in-progress Python script that would crawl sitemaps and extract structured data from all of the indexed pages. My intention was to provide a complete indexed solution with a simple web frontend, but I got a bit bogged down in first updating the Fedora packages for several of the dependencies, then tackling some bugs in the upstream libraries themselves. More to be done here!
Hmm. Well maybe I didn't miss conveying as much as I had feared. On the bright side, there was a great deal of interest in the SchemaBibEx "best practices and recommendations" documentation that I had promised we were working on... and today Richard Wallis described some of his work in this area. So that's a good thing. And even if some of the audience walked away from my talk with just an introduction to RDFa and schema.org, that put them in an extremely good position to be able to enjoy and understand Sean's subsequent lightning talk.
Oh, and my admission of being a semantic web dropout (due to the complexity of content negotiation and heterogeneous vocabularies and billions of triples and RDF/XML) ended up being a perfect setup for the immediately following talk Next Generation Catalogue - RDF as a Basis for New Services by Anne-Lena Westrum and Asgeir Rekkavik from the Oslo Public Library, who basically said "Semantic web? Oh yeah, we can totally do that!" and proceeded to show their MARC2RDF and RDF2MARC workflows. Very cool stuff (and delightful scheduling by the conference program committee!)
Saturday, March 22. 2014
Yesterday at the 2014 Evergreen International Conference I presented Structured library data: holdings, libraries, and beyond--a talk about the work I've done specifically with Evergreen and making some of the connections with Koha and VuFind's capabilities. Lots of attendees seemed happy with the talk and the direction that we're going with Evergreen, and have hope for the future relevance of our libraries' resources within normal search engines, as well as all of the possibilities opened up by exposing this open data about our libraries (locations, hours, branch relationships, contact informatoin) and their resources in a much more consumable form.
There was so much energy in the room, I could have talked for another hour... I love the Evergreen community!
Thursday, March 20. 2014
Two things of note:
It has been fun and invigorating to hear the responses of those who are seeing the results and direction of this work for the first time! More thoughts to come...
Monday, February 24. 2014
Last week I drew the blue line from Sudbury to Ottawa you see in the above map by running MozStumbler on my phone as we headed out to celebrate Winterlude. One day, that line might help you figure out where you are on your FirefoxOS phone! Here's what's going on:
GPS triangulates your position based on satellites, requires a line of sight to those satellites, and can take minutes to get a lock on your location. If you have a smartphone, you've probably noticed that running a maps application will return your location in seconds, not minutes; that's because modern smartphones use cell towers and wifi routers for triangulation purposes. Unlike GPS, your phone is usually continuously scanning for cell and wifi routers, so the data is immediately available at no extra cost to your phone's battery or CPU.
However, while the major smartphone operating system manufacturers have built databases that correlate cell towers and wifi routers with coordinates (and raised some privacy concerns while they were at it - Apple, Google), this data is not openly available. A new operating system, such as Mozilla's FirefoxOS, must licence a service such as Skyhook's, or build their own.
True to its open principles, Mozilla is building its own database of location information--the Mozilla Location Service--that aims "to provide an open service to provide location data" (that page needs wordsmithing but I digress). To collect the data, Mozilla offers an Android application called MozStumbler that you can run while you're out and about; it will build a collection of coordinates with wifi access points and cell towers, and then upload it to Mozilla (either via your data connection, or later when you have wifi connectivity if you prefer). Currently you have to sideload the APK onto your phone; it is not available on the Google Play Store (although it is on F-Droid).
While the fledgling location API is already available, it remains to be seen how Mozilla will run this service: if, for example, it will make data dumps available, or if it will rate-limit calls to the service. But given Mozilla's long and laudable track record, it seems worthwhile to trust that they will do the right thing and help them build their database. They have a long way to go. Comparing Mozilla's stats to Skyhook's, Mozilla has collected observations about 0.7 million cell towers and 17.5 million wifi access points, vs. Skyhook's 30 million and 1 billion respectively.
So why not fire up MozStumbler on your phone? Hey, if a lowly guy from
Sudbury can, in a little over a week, get into the top 200 data
contributors (me =
(Page 1 of 69, totaling 276 entries) » next page
This work is licensed under a Creative Commons Attribution-Share Alike 2.5 Canada License.