Linked Data: Introductory Resources
Posted on Sat 09 March 2019 in Linked Open Data
I was recently asked for a list of resources that would serve as good introductions for students interested in linked data concepts--not just in libraries, but in general, including knowledge graphs and inferences. It was a good opportunity to review sources that I used in the past, and to see what new sources might be appropriate. While there is a lot of exciting research literature, some of the core resources do not seem to have changed much in the past decade. If I'm wrong, please tell me what I've missed through one of my contact methods!
Update 2019-03-12: Aaron Bradley suggested the addition of some more good intro and Knowledge Graph entries. Thanks Aaron!
Introductions and overviews
Tim Berners-Lee, generally recognized as the creator of the World Wide Web and a co-creator of the Semantic Web, boils down the latter's overly complex implementation concerns into four simple, practical principles that he calls "linked data." This is the classic introduction.
His 2009 revision to these principles adds a fifth principle, that the data be distributed under an open license that does not impede its reuse for free, to be considered "linked open data."
A self-guided two-hour tutorial for an audience for whom programming is a means to their digital humanities research aims. This tutorial helps unpack some of the jargon found in other resources.
It introduces the Turtle and RDF/XML serializations, and touches on the SPARQL language for querying sets of linked data.
This remains my recommendation as the best overall introduction to linked data. Even though the book is eight years old, and some of the products it refers to no longer exist, the fundamental concepts that it presents are sound.
Manu Sporny is a gifted technical communicator. In this 12 minute video, he introduces linked data with hand-drawn slides featuring a cute mouse and robot.
If you like presentations as a way of ingesting information, Ruben's slides will be right up your alley. They provide a high-level single-slide introduction to many concepts you are likely to encounter in the literature and in the wild, such as the N-Quads and TriG serializations, and expand on areas such as the JSON-LD serialization that are increasingly relevant.
Verborgh covers RDF Schema (RDFS) and the Web Ontology Language (OWL) before introducing SPARQL and rules-based reasoners. The slides include interactive elements to demonstrate the query and inference examples running against live data.
These slides are a great resource for refreshing your understanding of a concept that might slip your mind in the future.
Publishing linked data
Best practices and considerations for publishing linked data. This book is organized into five sets of linked data patterns:
- Identifier patterns
- Modelling patterns
- Publishing patterns
- Data management patterns
- Application patterns
Each pattern within the set is clearly named to support discussions between the designers of linked data for a given project.
The benefits of JSON-LD are that it can be embedded in a single <meta> tag and that it is easily parsed by JavaScript in the browser client, whereas RDFa or microdata have to be embedded throughout the HTML template tags (making them prone to breaking through template updates) and require special JavaScript libraries for client-side parsing.
This site includes useful tools for trying out JSON-LD, including an interactive "playground", as well as links to further documentation such as videos by Manu Sporny--a skilled and entertaining communicator.
The current "editor's draft" for the JSON-LD 1.1 specification.
Although I would recommend JSON-LD for publishing linked data within HTML
documents, Resource Description Framework in Attributes (RDFa) might be
suitable for situations in which you can only control the HTML templates
for a web site. This hands-on tutorial teaches you to express linked data
in HTML by adding attributes such as @vocab
,
@about
, and @property
to HTML elements.
Ruben is back with another presentation; this time he tackles mappings from relational databases and other formats to linked data, touches briefly on validation and provenance, and compares three major alternatives for publishing data in bulk:
- data dumps
- SPARQL server
- Triple Pattern Fragments
Knowledge graphs
This is the canonical paper that describes how Google created and extends its Knowledge Vault, the RDF underpinnings of the Google Knowledge Graph.
To distinguish knowledge graphs from ontologies and knowledge bases, the authors assert the following definition:
A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.
The introduction in this paper provides a good overview of the structure of knowledge graphs and the major examples at the time of publication: Cyc and OpenCyc, Freebase, Wikidata, DBpedia, YAGO, NELL, Google's Knowledge Graph, Google's Knowledge Vault, Yahoo!'s Knowledge Graph, Microsoft's Satori, and Facebook's Entities Graph.
The subsequent survey of approaches to completion and error detection in knowledge graphs may also be of interest.
Unpacks the definition of knowledge graph developed by Ehrlinger & Wöß (2016) with a plain language description.
Reasoning and inferences
Still the definitive text for introducing reasoning into linked data. The book shows how to construct inferencing rules through RDFS and OWL schemas, illustrated with practical, easy-to-follow examples. Chapter 14, on good and bad modeling practices, is quite enjoyable.
One of the book's limitations is that it touches on OWL 2, but does not go into depth. That said, a 2015 presentation by Hendler notes that OWL has not been broadly adopted, suggests some of the reasons why, and recommends possible paths forward.
In this presentation, the co-author of "Semantic Web for the working ontologist" delves into the reasons OWL has not been used and suggests potential directions for the evolution of OWL to better support reasoning.