Structuring library data on the web with schema.org: we're on it!
Until recently, there has been a disappointing level of adoption of schema.org structured data in traditional core library systems such as catalogues and institutional repositories. But there is still hope!
In this session, l'll briefly (re)introduce you to the schema.org vocabulary as expressed in microdata and RDFa and convince you that it matters to libraries, explain how our W3 Schema Bib Extend group participates in the schema.org development process, identify library systems that now publish schema.org structured data by default, and explore emerging possibilities for enhancing libraries' presence on the web with structured data.
While the semantic web surfaced in the popular consciousness around 2001 with Tim Berners-Lee's article in Scientific American, concrete efforts to populate the semantic web were largely academic exercises weighed down by RDF/XML and focuses on millions and billions of "triples". Some vocabularies such as FOAF enjoyed pockets of real adoption, and lightweight structured data approaches such as microformats made progress through the mid-to-late 2000s in offering machine-readable data on the web. However, everything changed in 2011 when Google, Bing, and Yahoo published the schema.org vocabulary and announced their intention to offer richer search results for web pages that incorporated schema.org structured data.
While search engine optimization experts and publishers on the web quickly realized the value of schema.org, the library community has been more reserved in its adoption of schema.org. The Evergreen open source integrated library system began publishing basic schema.org microdata in its catalogue with its 2.2.0 release in June 2012, and OCLC WorldCat also began publishing schema.org via JSON in June 2012. Until recently, those were effectively the only library systems to implement schema.org structured data, but the efforts helped inform the creation of the W3 Schema Bib Extend community group in September 2012. This group is focused on identifying best practices for the use of and proposals for enhancements to schema.org as it relates to bibliographic data. In this presentation, I will highlight the work of the W3 Schema Bib Extend group and outline the schema.org enhancement process to show that mere mortals like us can influence the development of structured data vocabularies used by major search engines.
In October 2013, library systems became much more credible participants to the schema.org community: Evergreen greatly improved its schema.org implementation, and Koha and VuFind added their own equally robust schema.org implementations. We will walk through examples of the rich structured data exposed by these systems to demonstrate how approachable schema.org can be for developers of other library systems. We will also see how a simple discovery layer can be built solely from the structured data extracted from these systems.
As an active member of the W3 Schema Bib Extend group and the developer behind the schema.org support in Evergreen, Koha, and VuFind, I will wrap up the session with some thoughts about how we are still only at the beginning of schema.org adoption and search result enrichment in the library world, and suggest some concrete steps that system and web developers in libraries can take to improve the state of structured data on their section of the web.
Most of this was written between 1:00 am and 3:00 am, and unfortunately that shows through here and there. Given that most of you are way smarter than me, maybe you can point out other library systems that now have schema.org baked into their web pages, or you can offer corrections? Also, if you're organizing a library technology conference or symposium and are interested in a similar presentation, get in touch with me!