Linked Data: Introductory Resources

Posted on Sat 09 March 2019 in Linked Open Data

I was recently asked for a list of resources that would serve as good introductions for students interested in linked data concepts--not just in libraries, but in general, including knowledge graphs and inferences. It was a good opportunity to review sources that I used in the past, and to see what new sources might be appropriate. While there is a lot of exciting research literature, some of the core resources do not seem to have changed much in the past decade. If I'm wrong, please tell me what I've missed through one of my contact methods!

Update 2019-03-12: Aaron Bradley suggested the addition of some more good intro and Knowledge Graph entries. Thanks Aaron!

Introductions and overviews

Berners-Lee, T. (2009, June 18). Linked Data - Design Issues. Retrieved March 10, 2019, from https://www.w3.org/DesignIssues/LinkedData.html

Tim Berners-Lee, generally recognized as the creator of the World Wide Web and a co-creator of the Semantic Web, boils down the latter's overly complex implementation concerns into four simple, practical principles that he calls "linked data." This is the classic introduction.

His 2009 revision to these principles adds a fifth principle, that the data be distributed under an open license that does not impede its reuse for free, to be considered "linked open data."

Blaney, J. (2017). Introduction to the Principles of Linked Open Data. Programming Historian. Retrieved from https://programminghistorian.org/en/lessons/intro-to-linked-data

A self-guided two-hour tutorial for an audience for whom programming is a means to their digital humanities research aims. This tutorial helps unpack some of the jargon found in other resources.

It introduces the Turtle and RDF/XML serializations, and touches on the SPARQL language for querying sets of linked data.

Heath, T., & Bizer, C. (2011). Linked Data: Evolving the Web into a Global Data Space (Vol. 1). Morgan & Claypool. Retrieved from http://linkeddatabook.com/editions/1.0/

This remains my recommendation as the best overall introduction to linked data. Even though the book is eight years old, and some of the products it refers to no longer exist, the fundamental concepts that it presents are sound.

Sporny, M. (2012). What is Linked Data? Retrieved from https://www.youtube.com/watch?v=4x_xzT5eF5Q&feature=youtu.be

Manu Sporny is a gifted technical communicator. In this 12 minute video, he introduces linked data with hand-drawn slides featuring a cute mouse and robot.

Verborgh, R. (2019, March). The Semantic Web & Linked Data. Retrieved from http://rubenverborgh.github.io/WebFundamentals/semantic-web/

If you like presentations as a way of ingesting information, Ruben's slides will be right up your alley. They provide a high-level single-slide introduction to many concepts you are likely to encounter in the literature and in the wild, such as the N-Quads and TriG serializations, and expand on areas such as the JSON-LD serialization that are increasingly relevant.

Verborgh covers RDF Schema (RDFS) and the Web Ontology Language (OWL) before introducing SPARQL and rules-based reasoners. The slides include interactive elements to demonstrate the query and inference examples running against live data.

These slides are a great resource for refreshing your understanding of a concept that might slip your mind in the future.

Publishing linked data

Dodds, L., & Davis, I. (2012). Linked Data Patterns. Retrieved from http://patterns.dataincubator.org/book/

Best practices and considerations for publishing linked data. This book is organized into five sets of linked data patterns:

Identifier patterns
Modelling patterns
Publishing patterns
Data management patterns
Application patterns

Each pattern within the set is clearly named to support discussions between the designers of linked data for a given project.

JSON-LD - JSON for Linking Data. (n.d.). Retrieved March 9, 2019, from https://json-ld.org/

The benefits of JSON-LD are that it can be embedded in a single <meta> tag and that it is easily parsed by JavaScript in the browser client, whereas RDFa or microdata have to be embedded throughout the HTML template tags (making them prone to breaking through template updates) and require special JavaScript libraries for client-side parsing.

This site includes useful tools for trying out JSON-LD, including an interactive "playground", as well as links to further documentation such as videos by Manu Sporny--a skilled and entertaining communicator.

JSON-LD 1.1 : A JSON-based Serialization for Linked Data. (2019, March 1). World Wide Web Consortium. Retrieved from https://w3c.github.io/json-ld-syntax/

The current "editor's draft" for the JSON-LD 1.1 specification.

Scott, D. (2014, December 1). RDFa with schema.org codelab. Retrieved March 10, 2019, from https://coffeecode.net/swib14/preconference/rdfa_exercises/

Although I would recommend JSON-LD for publishing linked data within HTML documents, Resource Description Framework in Attributes (RDFa) might be suitable for situations in which you can only control the HTML templates for a web site. This hands-on tutorial teaches you to express linked data in HTML by adding attributes such as @vocab, @about, and @property to HTML elements.

Verborgh, R. (2019, March). Linked Data Publishing. Retrieved from http://rubenverborgh.github.io/WebFundamentals/linked-data-publishing/

Ruben is back with another presentation; this time he tackles mappings from relational databases and other formats to linked data, touches briefly on validation and provenance, and compares three major alternatives for publishing data in bulk:

data dumps
SPARQL server
Triple Pattern Fragments

Knowledge graphs

Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., … Zhang, W. (2014). Knowledge Vault: A Web-scale Approach to Probabilistic Knowledge Fusion. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 601–610). New York, NY, USA: ACM. https://doi.org/10.1145/2623330.2623623

This is the canonical paper that describes how Google created and extends its Knowledge Vault, the RDF underpinnings of the Google Knowledge Graph.

Ehrlinger, L., & Wöß, W. (2016). Towards a Deﬁnition of Knowledge Graphs. In Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS’16) (Vol. 1695, p. 4). Leipzig, Germany.

To distinguish knowledge graphs from ontologies and knowledge bases, the authors assert the following definition:

A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.

Paulheim, H. (2016). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8(3), 489–508. https://doi.org/10.3233/SW-160218

The introduction in this paper provides a good overview of the structure of knowledge graphs and the major examples at the time of publication: Cyc and OpenCyc, Freebase, Wikidata, DBpedia, YAGO, NELL, Google's Knowledge Graph, Google's Knowledge Vault, Yahoo!'s Knowledge Graph, Microsoft's Satori, and Facebook's Entities Graph.

The subsequent survey of approaches to completion and error detection in knowledge graphs may also be of interest.

Stichbury, J. (2017, May 10). WTF is a knowledge graph? Retrieved March 12, 2019, from https://hackernoon.com/wtf-is-a-knowledge-graph-a16603a1a25f

Unpacks the definition of knowledge graph developed by Ehrlinger & Wöß (2016) with a plain language description.

Reasoning and inferences

Allemang, D., & Hendler, J. A. (2011). Semantic Web for the working ontologist : effective modeling in RDFS and OWL (2nd ed.). Waltham, MA : Morgan Kaufmann/Elsevier,.

Still the definitive text for introducing reasoning into linked data. The book shows how to construct inferencing rules through RDFS and OWL schemas, illustrated with practical, easy-to-follow examples. Chapter 14, on good and bad modeling practices, is quite enjoyable.

One of the book's limitations is that it touches on OWL 2, but does not go into depth. That said, a 2015 presentation by Hendler notes that OWL has not been broadly adopted, suggests some of the reasons why, and recommends possible paths forward.

Hendler, J. (2015, October). On Beyond OWL: challenges for ontologies on the Web. Retrieved from https://www.slideshare.net/jahendler/on-beyond-owl-challenges-for-ontologies-on-the-web

In this presentation, the co-author of "Semantic Web for the working ontologist" delves into the reasons OWL has not been used and suggests potential directions for the evolution of OWL to better support reasoning.