tldr; my slides are here, and the
slides from Jenn and Jason are also available from ALA Connect.
On Sunday, June 29th Jenn Riley, Jason Clark, and I presented at the ALCTS/LITA
jointly sponsored session Understanding schema.org. The
build-up to the session was pretty amazing; I was delighted to learn that Jason
and I had been working on pretty much parallel efforts over the past couple of
years. Jenn did a great job of organizing the session, and by the time we
started talking 276 people had indicated their interest in attending: that was
two more than those who had indicated an interest in attending the BIBFRAME
Forum Update scheduled in the same time slot. Our room was large and quite full.
Jenn started the session out string by advancing her concept that libraries
need to target discovery elsewhere: that is, that there is no way
that libraries can compete directly with major search engines like Google,
Bing, and Yahoo, either through the discovery tools that we have to offer, our
presence in the consciousness of most of the population as the starting point
for discovery, or in the resources we can direct towards closing the huge gap
in technology, usability, and mindshare that the search engines have opened up
over the past two decades. But, we can take steps to start working
with the search engines to enable our resources to be discovered and accessed
more directly by them.
That led quite naturally to my own part of the session, in which I talked about
my attempt to turn cataloguing's efforts to provide access points in our niche
catalogues into access points for the open web by publishing schema.org
structured data from library catalogues like Evergreen, Koha, and VuFind. I
started things out by pointing out the legacy of restrictive
robots.txt files that still live on in many catalogues today, then
worked through some basics like how sitemaps enable search engines--which
strive to provide relevant, useful results that matter to users in their
context at a particular place and time--to efficiently crawl just the most
recently changed pages of interest. Then I launched into the heart of the talk
that showed how catalogues that publish schema.org structured data can turn an
undifferentiated mass of presentation-oriented HTML and words into
machine-comprehensible entities: classes like Book and
Organization, connected by properties like publisher,
and with values for properties like author,
datePublished, and isbn.
For this talk I used visualizations generated by the RDFa playground to illustrate the structured
data contained in some real examples of a production Evergreen system (thanks
to Bibliomation). Given that I'm normally a
text-and-talk kind of guy, the illustrations seemed to help out--particularly
in showing how holdings map quite readily to the Product /
Offer structure more commonly used by commercial enterprises to
reflect the prices, locations, and availability of their products.
Of course, the evolution from unstructured, to structured, to linked data had
its payoff beginning with the link from holdings to the libraries that hold the
resources. We have plenty more we can and must do, but unlike other efforts
which are still crystallizing and which will require significant architectural
work to happen before libraries can even begin trying out real systems, you
can use schema.org-enabled systems today. And adapting systems to
publish schema.org structured data only requires access to the HTML templates
for your system (which, hopefully, you have: otherwise you have bigger problems
to deal with!) and following the patterns that have already been established by
Evergreen, Koha, and VuFind.
Jason did a great job showing both a broader use case for schema.org, including
work he has led on digital collections such as embedding the
Recipe type in a book of recipes. And he covered some of the
evolution of the vocabulary, including the exciting possibilities introduced
by the Action type and potentialAction property for
describing RESTful APIs... which naturally led to an off-the-top-of-the-head
enumeration of such actions as BorrowAction and
LendAction that are perfect for libraries.
Perhaps the best part of the session, however, were the insightful questions
from the audience (along with the genuinely enthusiastic response to our
talks). We had deliberately left 15 minutes for questions, and we were not
disappointed: from questions about how we move from structured data to more
linked data (I riffed on the Dodds/Davis Progressive
Enrichment linked data pattern, suggesting that we should be able to store
links for each field or value of interest directly in our MARC records), to
questions about what proprietary systems are doing this with schema.org today
(alas, none that I'm aware of, unless something has changed since February).
Credit where credit's due - Lorcan Dempsey has been talking about "discovery elsewhere" for nearly 7 years (http://orweblog.oclc.org/archives/001430.html), and it's possible others were before that. Not my concept, but one I subscribe to.
And a shout-out to Jeremy Frumkin who at his session (http://ala14.ala.org/node/14573) the next morning took the time to add nuance to this topic that I didn't have time to. Jeremy's take is that discovery also happens elsewhere; that if a user is on our web site ad we can't get them to stuff we've failed. Fair enough. Our challenge is figuring out how to do this and still invest our resources where they can have the most impact.
Thanks Jenn! Yes, ten minutes is just enough to scratch the surface, but you did a great job in that time. I also thought I recognized the concept from elsewhere, but you did it great justice in our specific context.