Linked Data for Production - 2018 Workshop

Posted on Tue 08 May 2018 in Libraries

Bonus: Yesterday Wikidata: a platform for your library’s linked open data, an article I co-authored with Stacy Allison-Cassin, was published in the latest issue of the Code4Lib Journal.

LD4P Wikidata panel

Last week, I was honoured to be an invited speaker and participant at the Linked Data for Production (LD4P) - 2018 Workshop, held at Stanford University. LD4P represents the third round of Mellon Foundation funding for a set of partner organizations, including Stanford and Cornell, to advance the state of linked data adoption in libraries--and as the name states, the focus of this round was on putting linked data into production. My talk was just one of a block of talks on the potential role of Wikidata in library linked data adoption.

Surfacing Knowledge, Building Relationships: Indigenous Communities, ARL and Canadian Libraries

Stacy Allison-Cassin and Anna St. Onge, a librarian and an archivist respectively, both from York University, opened with acknowledgements of the traditional territories of both Stanford and York University, situating their talk about their work within the context of Canada's Truth and Reconciliation Commission outcomes challenging libraries, archives, and museums to address historical wrongs against Indigenous communities and omissions of Indigenous cultures. They are leading a project that is sponsored by ARL to enact meaningful change within their institution, and have chosen a case study focused on the Native Persons Area from the Mariposa Folk Festival from 1972 to 1978, part of their archival collection that celebrates cultural expression.

One of the challenges they faced was that the decisions of what has been considered worthy of inclusion in settler-dominated authorities such as LC, VIAF, and LAC (based on assumptions about the relative superiority of written and recorded works over oral traditions) have rendered Indigenous people almost invisible. Trying to reconcile names against VIAF to derive URIs resulted in very few matches--a 7% success rate--and in this context, the ability to describe people and relationships in Wikidata and immediately obtain usable URIs, offers a path forward for their project and increases the chance that the data will be reused and the community will be broadened. Anna and Stacy covered an immense amount of difficult territory in a short time and did a great job raising awareness of issues that projects may face as they cover topics outside of the traditional colonial focus of academia.

Accessing Wikidata: read and write

My role was to follow Stacy and Anna with an overview of methods for accessing Wikidata programmatically, both to retrieve data and to update and load data to the platform. With just 15 minutes allocated to the topic, it was an impossible task--but I feel like the resulting talk and accompanying slides did do justice to the topic, despite a few live-demo fails (one simply a matter of a slow network, one resulting from presenting on a borrowed laptop on which I was not logged into my Wikidata account). The reaction, stated both publicly and privately, was quite positive, and I feel pleased to have played a part in advancing libraries' potential adoption of and contribution to a large body of linked data and a community that many libraries might have otherwise not considered.

Wikidata in Libraries: a broader view

Alex Stinson of the Wikimedia Foundation bookended our talks with a general overview of Wikidata, including its (currently) 47M items and 2,400 properties representing different sources of authority IDs, and highlighted a number of libraries that were already engaging with Wikidata in using it both as a source of data, contributing data to it, and (in the case of the National Library of Italy in Florence), building a catalogue on top of their data loaded into an instance of Wikibase, the software on which Wikidata is built.

Discussion

There was a lively discussion after our talks, including concerns raised about ensuring the quality of the data is maintained, as well as questions about where the boundaries of library data merge (or not) with Wikidata. I noted that there is a data contribution process that enables the existing Wikidata community to vet the data before it goes in, and Alex added that data contributors are expected to ensure that their contributions will have a community that will maintain the data. Notability was also raised as a concern, and Stacy responded that Wikidata differs from the English Wikipedia project in that an item is notable if it supports any of the content in other Wikimedia projects, or other Wikidata items. A Wikidata item which links to nothing and to which nothing links is a likely candidate for removal.

LD4P day 2

There was more Wikidata content on day 2 of the workshop.

OCLC embraces Wikibase

On the second day of the workshop, Bruce Washburn unveiled a pilot project that OCLC had been working on for some time: using Wikibase, which he called "great software", as a ready-made platform for all elements of linked data: a human-friendly editor, autocompletion and full-text search, a built-in performant triplestore with quality visualizations, and utilities such as pywikibot and QuickStatements for loading data.

UC Davis strips it all down to URIs

Carl Stahmer from UC Davis showed their experiment built almost entirely on URIs, including a module built on Wikibase for their local data, relying on realtime lookups against all of the triplestores behind the URIs (LC, VIAF, Wikidata, etc) to retrieve labels and associated relationships to support discovery. Given the oft-repeated concerns about the reliability of SPARQL endpoints, it is a brave approach; Carl justified it by stating that we as a community have to decide whether linked data is truly a viable solution, or not, and had found that the major endpoints were reliable in their experience.