RDFa with schema.org codelab: Book - strings to things

By Dan Scott,

About this codelab

In this codelab, you're going to take your catalogue page from being a structured data silo to linking out to other entities on the web.

Audience: Beginner

Prerequisites: To complete this codelab, you will need a basic familiarity with HTML. The exercises can be found in codelab.zip, with the solutions found in the rdfa_exercises subdirectory. There are frequent checkpoints through the code lab, so if you get stuck at any point, you can use the checkpoint file to resume and work through this codelab at your own pace.

Strings to things

So far you have described the page using types and properties that are inside the page itself. But if you have to update some information that is common to many of your pages, that could be painful to roll out... and even if you have an automated process for updating that information across all of your pages, there is no guarantee that anything extracting data from your site will extract all of the updates at one time.

Fortunately, the problem of providing one copy of information on the web was solved at the same time the web was created: via the simple power of the link! And structured data is no different; in fact, linked data is a term that has emerged over the past few years marking a more pragmatic approach to building a web of structured data than the somewhat classically academic semantic web.

The following principles of linked data were first articulated by Tim Berners-Lee in a 2006 design note:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

Keep these principles in mind as you work through the following steps!

Continue working with the HTML file that you have been editing so far, or for a fresh start, copy ../1_book/step3/check_d.html into a new file.

Link the authors to external pages

There are many sources of identifiers for people on the web. Some sources that you may find familiar include:

Assuming your underlying system has the ability to store and express identifiers, you can help the machines disambiguate and retrieve more information about your authors by linking to their identifiers from your catalogue page. Use the sameas property to add links from your simple text representation of the authors of this book to external resources.

Hint: To save you time in looking up identifiers, here are a few for Agnes M Herzberg:

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
    <td class="recordAuthor" property="author copyrightHolder" typeof="Person" resource="#author1">
        <a href="/Author?lookfor=%22Herzberg%2C+Agnes+M%22">
          <span resource="#author1">
            <link property="sameAs" href="http://id.loc.gov/authorities/names/n84200859">
            <link property="sameAs" href="http://viaf.org/viaf/64300918">
            <span property="name">
              <span property="familyName">Herzberg</span>,
              <span property="givenName">Agnes M</span>
            </span>
          </span>
        </a>
      </span>
    </td>
...

Note: While it might be tempting to use the url property, that is normally reserved for linking to a URL where the thing that is described is available (for example, linking to a downloadable podcast or e-book). In contrast, sameAs is used to link to a description of the thing.

Create separate pages for the authors in your own system

Take a look at how the page has developed over time; there is now a lot of HTML markup just to describe the author, and you can imagine more markup if you were to express all of the see from and see also forms that might be contained in a local authority record. If your system uses local authority records, in fact, they are a perfect candidate for refactoring your markup. You can move the bulk of the markup from the bibliographic record display page into a separate page about the author, built on your local authority record. Then, once it is a separately displayed page, then you can simply link to it from this page... as well as from any other pages that want to provide information about this author.

Create a new file named agnesMHerzberg.html in your text editor, and copy the @resource="#author1" markup into the file.

As the new file describes a single type, you can move the declaration of the type into the <body> element of the new page, and you can (optionally) remove the @resource attributes from the markup that you pasted into the file. Don't forget the @vocab declaration! Use your existing page as a template. Use the RDFa parsers to ensure that the markup in the new file expresses the same information as it did in the original file.

Repeat these steps to create davidFAndrews.html, using the @resource="#author2" markup as the source of interest.

Check your agnesMHerzberg.html
<!DOCTYPE html>
<html>
<head>
  <title>Agnes M Herzberg</title>
</head>

<body vocab="http://schema.org/" typeof="Person" resource="#person">
  <link property="sameAs" href="http://id.loc.gov/authorities/names/n84200859">
  <link property="sameAs" href="http://viaf.org/viaf/64300918">
  <span property="name">
    <span property="familyName">Herzberg</span>,
    <span property="givenName">Agnes M</span>
  </span>
</body>
</html>
Check your davidFAndrews.html
<!DOCTYPE html>
<html>
<head>
  <title>David F Andrews</title>
</head>
<body vocab="http://schema.org/" typeof="Person" resource="#person">
  <span property="name">
    <link property="sameAs" href="http://id.loc.gov/authorities/names/n84200861">
    <link property="sameAs" href="http://viaf.org/viaf/291650334">
    <link property="sameAs" href="http://www.freebase.com/m/0nfj58b">
    <span property="familyName">Andrews</span>,
    <span property="givenName">David F</span>
  </span>
</body>
</html>

Link to the author page

Now, replace the inline markup in the original page with a simple link to your new file. You still want to state that "Author Name" is the author of the book using the @property="author" assertion, but now you can either add that property directly to an <a> element that links to your new file, or use the resource attribute to link to the external file instead of the internal markup. This is a signal to any RDFa parser that the linked resource contains the data for the named property.

Note: "when the element contains the href (or src) attribute, @property is automatically associated with the value of this attribute rather than the textual content of the <a> element" (Adida, Ben; Birbeck, Mark; Herman, Ivan; Sporny, Manu. RDFa 1.1 Primer - Second edition). Using a @property attribute on the same element as a @resource attribute works in a similar fashion; the target of the @resource attribute is used as the value of the @property attribute.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
  <tr valign="top">
    <th>Weitere Autoren:</th>
    <td class="recordSecAuthor" property="author copyrightHolder" resource="davidFAndrews.html#person">
        <a href="/Author?lookfor=%22Andrews%2C+David+F%22">Andrews, David F</a>
      </span>
    </td>
  </tr>
...

Checkpoint: Your original HTML page should now look like ../1_book/step4/check_e.html and your new author HTML pages should look like ../1_book/step4/agnesMHerzberg.html and ../1_book/step4/davidFAndrews.html.

Augment the author page

Now that you have created an entirely separate author page, you can add much more information about the author; for example, you can include an email address, links to their personal web sites and social media accounts, a list of their publications and previous talks... far more information than you would have wanted to publish inline in the book itself.

Following the principles of linked data can lead not only to more efficient maintenance of information and (potentially) more useful results in search engines and other aggregators of data, but also to a better information design and experience for your users.

Use the Person properties to flesh out the "about this author" page with properties such as address, birthDate, email, follows, and telephone. Be adventurous, and remember to try to use nested types and ranges appropriately!

Linking to authored works with @rev

If you tried creating a list of the works that David F Andrews has written, you might have been frustrated by that, while the Person type has a performerIn, there is no equivalent property for "author of". There is the author property for saying that a book was written by a given person, but it only works in one direction: the domain is CreativeWork and the range is Organization or Person.

Fortunately, rather than having to create two properties to cover both possible directions for relationships between two entities, RDFa allows you to use the @rev attribute to declare that the relationship direction for this particular property is the reverse of what it would be if you used @property.

Go ahead and use the @rev attribute to enhance your description of David F. Andrews to show that he authored the following books:

  1. Data: A Collection of Problems from Many Fields for the Student and Research Worker
  2. Robust Estimates of Location: Survey and Advances
  3. Symbolic Computation for Statistical Inference
Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Person" resource="#person">
  <span property="name">
    <link property="sameAs" href="http://id.loc.gov/authorities/names/n84200861">
    <link property="sameAs" href="http://viaf.org/viaf/291650334">
    <link property="sameAs" href="http://www.freebase.com/m/0nfj58b">
    <span property="familyName">Andrews</span>,
    <span property="givenName">David F</span>
  </span>
  <h2>Publications:</h2>
  <ul>
    <li><a rev="author" typeof="Book"
        href="http://www.amazon.ca/Data-Collection-Problems-Student-Research/dp/0387961259"><span
        property="name">Data: A Collection of Problems from Many Fields for the
        Student and Research Worker</span></a>
    </li>
    <li><a rev="author" typeof="Book"
        href="http://www.amazon.ca/Robust-Estimates-Location-Survey-Advances/dp/0691081166"><span
        property="name">Robust Estimates of Location: Survey and Advances</span></a>
    </li>
    <li><a rev="author" typeof="Book"
        href="http://www.amazon.ca/Symbolic-Computation-Statistical-Inference-Andrews/dp/0198507054"><span
        property="name">Symbolic Computation for Statistical Inference</span></a>
    </li>
  </ul>
</body>
</html>

Note: As with any advanced technique, a given schema.org consumer might not understand the meaning of the @rev RDFa attribute.

Lessons learned

In this exercise, you learned:

Subject headings

In this exercise, you will mark up subject headings. The first approach treats subject headings simply as keywords, which is appropriate for library systems that do not control subject headings or which do not expose the source for the subject headings. Then we will embellish our markup by treating the subject headings as part of an externally controlled vocabulary.

Marking up subject headings as keywords

Identifying subject headings in the catalogue page as simple text keywords can be useful for building a search engine that can provide relevance bumps based on the keywords, rather than relying on arbitrary text within the web page.

Find the subject headings in the page, mark them up using the schema.org keywords property, and check your work.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
..
        <tr valign="top">
          <th>Schlagwortketten:</th>
          <td class="recordSubjects" property="keywords">
            <a href='/Summon/Search?lookfor=%22Daten%22&type=SubjectTerms
    '>Daten</a> → <a href='/Summon/Search?lookfor=%22Beispielsammlung%22&type=SubjectTerms
    '>Beispielsammlung</a><br>        <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms
    '>Statistik</a> → <a href='/Summon/Search?lookfor=%22Datenanalyse%22&type=SubjectTerms
    '>Datenanalyse</a> → <a href='/Summon/Search?lookfor=%22Aufsatzsammlung%22&type=SubjectTerms
    '>Aufsatzsammlung</a><br>        <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms
    '>Statistik</a> → <a href='/Summon/Search?lookfor=%22Lehrbuch%22&type=SubjectTerms
    '>Lehrbuch</a><br>        <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms
    '>Statistik</a><br>

          </td>
        </tr>
...

Marking up subject headings as things

While simple text keywords can be useful, we have learned that by linking to external entities, machines can disambiguate text and connect our work to the broader cloud of linked data.

Find matches for the subject headings in the page in the GND of the SWB Online-Katalog. The results will include a GND number with a "Link zu diesem Datensatz in der GND". Use that link to mark up your subject headings as external entities. Don't worry that the link resolves to what appears to be a catalogue page; this is because the server does content negotiation and is simply trying to make your browser happy by serving up human-readable HTML instead of machine-readable metadata. Machines should know how to request the format they require.

This time, use the about property, as it is intended to identify The subject matter of the content--perfect for our purposes. Then check your work.

Note: You can also continue to mark up the text of the subject headings as keywords, if you like; these approaches are compatible and different clients may use different approaches to consuming the data that you offer.

Note: Library of Congress subject headings can be found at http://id.loc.gov.

Check your markup
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
        <tr valign="top">
          <th>Schlagwortketten:</th>
          <td class="recordSubjects" property="keywords">
    <a href='/Summon/Search?lookfor=%22Daten%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4135391-2">Daten</a> →
    <a href='/Summon/Search?lookfor=%22Beispielsammlung%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4144384-6">Beispielsammlung</a><br>
    <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4056995-0">Statistik</a> →
    <a href='/Summon/Search?lookfor=%22Datenanalyse%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4123037-1">Datenanalyse</a> →
    <a href='/Summon/Search?lookfor=%22Aufsatzsammlung%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4143413-4">Aufsatzsammlung</a><br>
    <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4056995-0">Statistik</a> →
    <a href='/Summon/Search?lookfor=%22Lehrbuch%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4123623-3">Lehrbuch</a><br>
    <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms'
      property="about" resource="http://d-nb.info/gnd/4056995-0">Statistik</a><br>
          </td>
        </tr>
...

Checkpoint: Your original HTML page should now look like ../1_book/step4/check_f.html.

Lessons learned

In this exercise, you learned:

Next codelab: Book - external descriptions

About the author

Dan Scott is a systems librarian at Laurentian University.

Informational resources

  • RDFa Lite (W3C Recommendation) - a marvel of technical writing, this is a specification written as a concise, extremely useful tutorial
  • schema.org - the source for the vocabulary types and definitions, although the examples all use microdata or JSON-LD instead of RDFa Lite
  • RDFa Primer (W3C Working Group Note) - a more in-depth RDFa tutorial that covers properties beyond RDFa Lite; the additional examples may help clarify how RDFa Lite works (really, you don't need anything beyond RDFa Lite!)
  • Heath, Tom; Bizer, Christian. Linked data: Evolving the Web into a Global Space - a book (freely available on the web) that goes into depth to cover the principles, patterns, and best practices for publishing linked data on the web

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.