Library catalogues and HTTP status codes
Posted on Mon 29 December 2014 in Libraries
I noticed in Google's Webmaster Tools that our catalogue had been returning some Soft 404s. Curious, I checked into some of the URIs suffering from this condition, and realized that Evergreen returns an HTTP status code of 200 OK when it serves up a record details page for a record that has been deleted. The HTML itself has a nice big red alert box warning users that the record has been deleted to help humans realize that what was once there is no longer, but machines typically don't read English. However, at some point in the past few months, Google started parsing the HTML and recognizing when HTTP status codes are misleading.
That led me to wonder what happens when you request a record detail page by ID for a record that doesn't exist in Evergreen. As it turns out, it currently returns HTTP status code 200 with a detail page devoid of any details. Also not good! Being a good little Evergreen community member, I opened a bug and put together a fairly simple fix so that the catalogue will return a 404 Not Found for non-existent records and 410 Gone for deleted records. Huzzah for HTTP standards compliance. We build a better web one small step at a time.
That, in turn, led me to wonder what happens when you request record details for non-existent records in other library systems. Here's what I found:
- Bibliocommons: Status 302 Moved temporarily that then leads back to an empty search form. Not good.
- Blacklight: Status 404 Not Found. Good!
- Encore: N/A - appears to send up session based URLs for records. Really?
- III: Status 200 OK. Not good.
- Koha: Status 302 Found with a Location: header leading to a page with a status 404 Not Found. That redirect probably makes it harder for the machines to recognize that the resource does not at all exist than if it directly returned a 404.
- Polaris: N/A - it seems that the normal web interface doesn't link directly to titles; instead it serves up titles in the context of search results by position. The mobile web interface offers persistent URLs, but requests for non-existent records return a status 302 Found that redirects back to an empty search form. Not good.
- Primo (using a permalink): Status 302 Found that then leads to an empty record details page with a status 200 OK. Not good.
- Symphony: N/A - I tried a few systems (Houston Public Library, Oxnard Public Library) and it seems SirsiDynix still doesn't use persistent URLs, nor surface permalinks for records in the default interface.
- Voyager: Status 200 OK. Not good.
- Vufind: Status 404 Not Found. Good!
- WorldCat: Status 200 OK. Not good.
Overall, this is a pretty dismal picture of the state of some of the most commonly used library catalogue systems when it comes to compliance with basic web standards. Kudos to Blacklight and Vufind for getting it right--and assuming that my branch gets integrated, Evergreen should join them in the near future.