What would you understand if you read the entire world wide web?

Posted on Mon 03 February 2014 in Linked Open Data

On Tuesday, February 4th, I'll be participating in Laurentian University's Research Week lightning talks. Unlike most five-minute lightning talk events in which I've participated, the time limit for each talk tomorrow will be one minute. Imagine 60 different researchers getting up to summarize their research in one minute each, and you have what is likely to be a brain-melting hour. Should be fun!

Here's a rough draft of what I'm planning to say (which, when read at an even cadence with decent intonation, comes out to exactly one minute:)

What would you understand if you read the _entire_ world wide web?

As humans, we would understand a lot: but we can rely on the context, structure, and significance of elements of web pages to derive meaning.

The algorithms behind search engines adopt a similar approach, but struggle with ambiguity; when a web page mentions "Dan Scott", is it:

  • "Dan Scott" the character from the One Tree Hill TV show
  • "Dan Scott" the artist from Magic the Gathering card game
  • "Dan Scott" the Ontario academic professor from the University of Waterloo
  • "Dan Scott" the Ontario academic librarian from Laurentian University

schema.org is a vocabulary for embedding explicit meaning and intent within web pages that offers a way to disambiguate those entities.

My research is a collaborative effort--within the auspices of the World Wide Web Consortium--to define bibliographic extensions for schema.org where necessary, and best practices based on concrete implementations in three different library systems.