Belated Access 2006 notes: Saturday, Oct. 14th

Posted on Tue 31 October 2006 in Libraries

Final entry in publishing my own hastily jotted Access 2006 conference notes--primarily for my own purposes, but maybe it will help you indirectly find some real content relating to your field of interest at the official podcast/presentation Web site for Access 2006. Contents include:

Consortium update

ASIN, Slavko Manojlovich

ASIN Overview

  • 17 atlantic academic libraries
  • 300 - 18,000 students
  • 2 unilingual francophone sites
  • Sirsi, Ex Libris, and Innovative

Why our users hate us

  • choose format over subject
  • learn multiple database interfaces
  • citations presented in confusing formats

Addressing the problems

  • a la carte user authentication
  • EZProxy servers
  • SingleSearch federated search tool over 400 resources (including 100+ open access)
  • 1Cate OpenURL resolvers
  • Relais ILL
  • Refwork/Refshare

Principles

  • Click, don't type
  • when you have it, show it
  • when you don't have it, make it easy to get
  • focus on appropriate links rather than click counts
  • let the user determine the appropriate copy from the available formats

Stephen Sloan

  • Missing ingredient -- enabling subject choice for users, rather than format
  • working with SirsiDynix on a consortium version of EPS Rooms CMS
  • production version to be available in 1st quarter of 2007
  • Rooms is basically a portal environment, with different defaults/scoping for each subject (so that single search

Outstanding challenges

  • Federated search connectors based on screen scraping will break
  • Citations from certain resources cannot be linked to Resolver
  • Cookie pushing in a public environment
  • Implementation of the NISO Metasearch standard to improve federated searching

Recognizing our differences

  • Local customization of interfaces
  • Emulating local default search options--everyone use EBSCO, but everyone has configured different behaviour
  • Relying on local expertise at each site

COPPUL, Carmen Kazakoff-Lane

COPPUL Overview

  • ANTS: Using Open Source, Social Software (in the COPPUL consortium)
  • sharing and updating animated tutorials that were believed to be a better option than long information literacy tutorials
  • make it easy to locate and use these tutorials (central location and explicit copyright / reuse statement)
  • Make sharing easy and desirable through quality standards, help, and the allowance for local customization

How does it work?

  • Project is hosted at http://brandonu.ca/Library/coppul
  • ask each institution to take responsibility for a certain set of databases so that they can be updated along with the user interface
  • wiki enables institutions to update database list with status of development, whenever they create a tutorial, or add a new database to the list
  • rss feeds enable you to track which tutorials have been updated or created
  • tutorials are housed within a single institutional repository, licensed under CC licenses with options to the creators
  • Other organizations (like LU) are welcome to participate!

Guy Teasdale, Laval Universite

Quebec Digital Infrastructure: The Year in Review

Main players

  • BAnQ - Bibliotheque et archives nationales du Quebec
  • CREPUQ - Conference of Rectors and Principals of Quebec Universities
  • Erudit
  • Museums
  • Quebec Gov.
  • SRC and other media

BAnQ

  • BNQ started in 1967
  • April 2005, opening of la Grande Bibliotheque of the BNQ
  • Jan 2006 - Merger of ANQ and BNQ; mandate to acquire and disseminate collections
  • October 2006 - Second meeting on digital national library
  • 1996 - beginning of digitization activities
  • 2003 - permanent digitation program
  • 3.2 million pages of digital materials (newspapers, etc) currently in the collection; 62000 images

Meanwhile in the World

  • Dec 2004 - Google print project: 15 M ebooks by 2010
  • Jan 2005 - CEO of BND Jeanneney react in Le Monde "Quand Google defie l'Europe", results in the proposal for the Creation of European Digital Library
  • 2010 European DL expect 6M books
  • Fevrier 2006 Franco network of digital libraries was formed (including France and Quebec)

Meanwhile in Canada

  • Quebec is participating in Alouette Canada, hoping that nobody is reinventing the wheel

Erudit

  • 18000 scholarly articles from 48 journals
  • 150000 backfiles projected
  • Erudit schema adopted by www.persee.fr and www.cens.cnrs.fr = franc interoperability
  • $3,000 annually to join

OCLC, Thom Hickey

Virtual International Authority File (VIAF)

  • Link national authority records
  • Build on their authority work
  • Move towards universal bibliographic control, while allowing local variations to exist
  • Deutsche Nationalbiblithek, LoC, and OCLC -- hoping for the BNF (French) national file
  • OCLC is responsible for the actual coding for the project

Matching variations

  • In the LCNAF and PND authority files:
    • Same name, same person
    • Same name, different people
    • Different names, same person
    • Missing person in one file

Enhancing the authorities

  • Bibliographic record -> Derived authority -> Enhanced authority
  • Authority record -> Enhanced authority

Weaker attributes

  • Only one of birth/death dates
  • Subject area of works
  • Format
  • Language
  • Publisher
  • Partial title match

Even weaker attributes

  • Date of publication
  • Country
  • Role
  • Format

Compute it

  • Standard approach:
    • Generate keys and data
    • Load information into a database
    • Index it
    • Extract fields needed
  • Map/reduce approach (adopted from Google)
    • Split the database up
    • Run parallel jobs against those pieces of the database
    • Bring information together through map/reduce

Map/Reduce

  • Map
    • Read in source file (e.g MARC21)
    • Write out key + data
  • Reduce
    • Read in array of data for each unique key
    • Write out key + data

Map/Reduce implementation

  • Written in Python
  • Uses ssh and XML-RPC for control and communication
  • Map/Reduce seems to add around 10% overhead
  • Earlier implementation ran on a 48 CPU cluster
  • Current VIAF cluster is a 12 CPU cluster on 4 nodes
  • Running Linux and 64-bit Python (no need to worry about 2GB memory limit)

VIAF matching code

  • 17 modules
  • 1,100 lines of code
  • 600 lines of configuration