In this codelab, you're going to take your catalogue page from being a structured data silo to linking out to other entities on the web.
Audience: Beginner
Prerequisites: To
complete this codelab, you will need a basic familiarity with HTML. The
exercises can be found in codelab.zip,
with the solutions found in the rdfa_exercises
subdirectory. There are
frequent checkpoints through the code lab, so if you get stuck at any point,
you can use the checkpoint file to resume and work through this codelab
at your own pace.
So far you have described the page using types and properties that are inside the page itself. But if you have to update some information that is common to many of your pages, that could be painful to roll out... and even if you have an automated process for updating that information across all of your pages, there is no guarantee that anything extracting data from your site will extract all of the updates at one time.
Fortunately, the problem of providing one copy of information on the web was solved at the same time the web was created: via the simple power of the link! And structured data is no different; in fact, linked data is a term that has emerged over the past few years marking a more pragmatic approach to building a web of structured data than the somewhat classically academic semantic web.
The following principles of linked data were first articulated by Tim Berners-Lee in a 2006 design note:
Keep these principles in mind as you work through the following steps!
Continue working with the HTML file that you have been editing so far, or for a fresh start, copy ../1_book/step3/check_d.html into a new file.
There are many sources of identifiers for people on the web. Some sources that you may find familiar include:
Assuming your underlying system has the ability to store and express
identifiers, you can help the machines disambiguate and retrieve more
information about your authors by linking to their identifiers from your
catalogue page. Use the sameas
property to add links from your
simple text representation of the authors of this book to external
resources.
Hint: To save you time in looking up identifiers, here are a few for Agnes M Herzberg:
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
<td class="recordAuthor" property="author copyrightHolder" typeof="Person" resource="#author1">
<a href="/Author?lookfor=%22Herzberg%2C+Agnes+M%22">
<span resource="#author1">
<link property="sameAs" href="http://id.loc.gov/authorities/names/n84200859">
<link property="sameAs" href="http://viaf.org/viaf/64300918">
<span property="name">
<span property="familyName">Herzberg</span>,
<span property="givenName">Agnes M</span>
</span>
</span>
</a>
</span>
</td>
...
Note: While it might be tempting to use the url
property, that is normally reserved for linking to a URL where the
thing that is described is available (for example, linking to a
downloadable podcast or e-book). In contrast, sameAs
is
used to link to a description of the thing.
Take a look at how the page has developed over time; there is now a lot of HTML markup just to describe the author, and you can imagine more markup if you were to express all of the see from and see also forms that might be contained in a local authority record. If your system uses local authority records, in fact, they are a perfect candidate for refactoring your markup. You can move the bulk of the markup from the bibliographic record display page into a separate page about the author, built on your local authority record. Then, once it is a separately displayed page, then you can simply link to it from this page... as well as from any other pages that want to provide information about this author.
Create a new file named agnesMHerzberg.html
in your text editor,
and copy the @resource="#author1"
markup into the file.
As the new file describes a single type, you can move the
declaration of the type into the <body>
element of
the new page, and you can (optionally) remove the @resource
attributes from the markup that you pasted into the file. Don't forget the
@vocab
declaration! Use your existing page as a template.
Use the RDFa parsers to ensure that the markup in the new file
expresses the same information as it did in the original file.
Repeat these steps to create davidFAndrews.html
, using the
@resource="#author2"
markup as the source of interest.
<!DOCTYPE html>
<html>
<head>
<title>Agnes M Herzberg</title>
</head>
<body vocab="http://schema.org/" typeof="Person" resource="#person">
<link property="sameAs" href="http://id.loc.gov/authorities/names/n84200859">
<link property="sameAs" href="http://viaf.org/viaf/64300918">
<span property="name">
<span property="familyName">Herzberg</span>,
<span property="givenName">Agnes M</span>
</span>
</body>
</html>
<!DOCTYPE html>
<html>
<head>
<title>David F Andrews</title>
</head>
<body vocab="http://schema.org/" typeof="Person" resource="#person">
<span property="name">
<link property="sameAs" href="http://id.loc.gov/authorities/names/n84200861">
<link property="sameAs" href="http://viaf.org/viaf/291650334">
<link property="sameAs" href="http://www.freebase.com/m/0nfj58b">
<span property="familyName">Andrews</span>,
<span property="givenName">David F</span>
</span>
</body>
</html>
Now, replace the inline markup in the original page with a simple link
to your new file. You still want to state that "Author Name" is the
author of the book using the @property="author"
assertion, but now you can either add that property directly to an
<a>
element that links to your new file, or use the
resource
attribute to link to the external file instead of
the internal markup. This is a
signal to any RDFa parser that the linked resource contains the data
for the named property.
Note: "when the element contains the href
(or
src
) attribute, @property
is automatically
associated with the value of this attribute rather than the textual
content of the <a>
element" (Adida, Ben;
Birbeck, Mark; Herman, Ivan; Sporny, Manu. RDFa 1.1 Primer - Second
edition). Using a @property
attribute on the
same element as a @resource
attribute works in a similar
fashion; the target of the @resource
attribute is used as
the value of the @property
attribute.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
<tr valign="top">
<th>Weitere Autoren:</th>
<td class="recordSecAuthor" property="author copyrightHolder" resource="davidFAndrews.html#person">
<a href="/Author?lookfor=%22Andrews%2C+David+F%22">Andrews, David F</a>
</span>
</td>
</tr>
...
Checkpoint: Your original HTML page should now look like ../1_book/step4/check_e.html and your new author HTML pages should look like ../1_book/step4/agnesMHerzberg.html and ../1_book/step4/davidFAndrews.html.
Now that you have created an entirely separate author page, you can add much more information about the author; for example, you can include an email address, links to their personal web sites and social media accounts, a list of their publications and previous talks... far more information than you would have wanted to publish inline in the book itself.
Following the principles of linked data can lead not only to more efficient maintenance of information and (potentially) more useful results in search engines and other aggregators of data, but also to a better information design and experience for your users.
Use the Person
properties to flesh out the "about this author" page with properties
such as address
, birthDate
, email
, follows
, and telephone
. Be
adventurous, and remember to try to use nested types and ranges
appropriately!
@rev
If you tried creating a list of the works that David F Andrews has written, you might have been frustrated by that, while the Person type has a performerIn, there is no equivalent property for "author of". There is the author property for saying that a book was written by a given person, but it only works in one direction: the domain is CreativeWork and the range is Organization or Person.
Fortunately, rather than having to create two properties to cover both
possible directions for relationships between two entities, RDFa allows you
to use the @rev
attribute to declare that the relationship
direction for this particular property is the reverse of what it would be
if you used @property
.
Go ahead and use the @rev
attribute to enhance your description of David F. Andrews to show that he
authored the following books:
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Person" resource="#person">
<span property="name">
<link property="sameAs" href="http://id.loc.gov/authorities/names/n84200861">
<link property="sameAs" href="http://viaf.org/viaf/291650334">
<link property="sameAs" href="http://www.freebase.com/m/0nfj58b">
<span property="familyName">Andrews</span>,
<span property="givenName">David F</span>
</span>
<h2>Publications:</h2>
<ul>
<li><a rev="author" typeof="Book"
href="http://www.amazon.ca/Data-Collection-Problems-Student-Research/dp/0387961259"><span
property="name">Data: A Collection of Problems from Many Fields for the
Student and Research Worker</span></a>
</li>
<li><a rev="author" typeof="Book"
href="http://www.amazon.ca/Robust-Estimates-Location-Survey-Advances/dp/0691081166"><span
property="name">Robust Estimates of Location: Survey and Advances</span></a>
</li>
<li><a rev="author" typeof="Book"
href="http://www.amazon.ca/Symbolic-Computation-Statistical-Inference-Andrews/dp/0198507054"><span
property="name">Symbolic Computation for Statistical Inference</span></a>
</li>
</ul>
</body>
</html>
Note: As with any advanced technique, a given schema.org consumer
might not understand the meaning of the @rev
RDFa attribute.
In this exercise, you learned:
@property
and @href
attributes to link to data on another page@rev
to express the inverse relationship
of a property between two entities.
In this exercise, you will mark up subject headings. The first approach treats subject headings simply as keywords, which is appropriate for library systems that do not control subject headings or which do not expose the source for the subject headings. Then we will embellish our markup by treating the subject headings as part of an externally controlled vocabulary.
Identifying subject headings in the catalogue page as simple text keywords can be useful for building a search engine that can provide relevance bumps based on the keywords, rather than relying on arbitrary text within the web page.
Find the subject headings in the page, mark them up using the schema.org
keywords
property,
and check your work.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
..
<tr valign="top">
<th>Schlagwortketten:</th>
<td class="recordSubjects" property="keywords">
<a href='/Summon/Search?lookfor=%22Daten%22&type=SubjectTerms
'>Daten</a> → <a href='/Summon/Search?lookfor=%22Beispielsammlung%22&type=SubjectTerms
'>Beispielsammlung</a><br> <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms
'>Statistik</a> → <a href='/Summon/Search?lookfor=%22Datenanalyse%22&type=SubjectTerms
'>Datenanalyse</a> → <a href='/Summon/Search?lookfor=%22Aufsatzsammlung%22&type=SubjectTerms
'>Aufsatzsammlung</a><br> <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms
'>Statistik</a> → <a href='/Summon/Search?lookfor=%22Lehrbuch%22&type=SubjectTerms
'>Lehrbuch</a><br> <a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms
'>Statistik</a><br>
</td>
</tr>
...
While simple text keywords can be useful, we have learned that by linking to external entities, machines can disambiguate text and connect our work to the broader cloud of linked data.
Find matches for the subject headings in the page in the GND of the SWB Online-Katalog. The results will include a GND number with a "Link zu diesem Datensatz in der GND". Use that link to mark up your subject headings as external entities. Don't worry that the link resolves to what appears to be a catalogue page; this is because the server does content negotiation and is simply trying to make your browser happy by serving up human-readable HTML instead of machine-readable metadata. Machines should know how to request the format they require.
This time, use the about
property, as it is intended to identify The subject matter of the
content--perfect for our purposes. Then check your work.
Note: You can also continue to mark up the text of the subject headings as keywords, if you like; these approaches are compatible and different clients may use different approaches to consuming the data that you offer.
Note: Library of Congress subject headings can be found at http://id.loc.gov.
<!DOCTYPE html>
...
<body vocab="http://schema.org/" typeof="Book" resource="#book">
...
<tr valign="top">
<th>Schlagwortketten:</th>
<td class="recordSubjects" property="keywords">
<a href='/Summon/Search?lookfor=%22Daten%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4135391-2">Daten</a> →
<a href='/Summon/Search?lookfor=%22Beispielsammlung%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4144384-6">Beispielsammlung</a><br>
<a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4056995-0">Statistik</a> →
<a href='/Summon/Search?lookfor=%22Datenanalyse%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4123037-1">Datenanalyse</a> →
<a href='/Summon/Search?lookfor=%22Aufsatzsammlung%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4143413-4">Aufsatzsammlung</a><br>
<a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4056995-0">Statistik</a> →
<a href='/Summon/Search?lookfor=%22Lehrbuch%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4123623-3">Lehrbuch</a><br>
<a href='/Summon/Search?lookfor=%22Statistik%22&type=SubjectTerms'
property="about" resource="http://d-nb.info/gnd/4056995-0">Statistik</a><br>
</td>
</tr>
...
Checkpoint: Your original HTML page should now look like ../1_book/step4/check_f.html.
In this exercise, you learned:
keywords
property to potentially
improve the relevance of those keywords in search results
in consuming applications;about
property.Next codelab: Book - external descriptions
Dan Scott is a systems librarian at Laurentian University.
This work
is licensed under a Creative
Commons Attribution-ShareAlike 4.0 International License.