Belated Access 2006 notes: Saturday, Oct. 14th
Posted on Tue 31 October 2006 in Libraries
Final entry in publishing my own hastily jotted Access 2006 conference notes--primarily for my own purposes, but maybe it will help you indirectly find some real content relating to your field of interest at the official podcast/presentation Web site for Access 2006. Contents include:
- consortial updates from ASIN, Quebec, COPPUL, and OCUL
- Thom Hickey's updates on OCLC's Virtual International Authority File (VIAF) and WorldCat Identities
- Clifford Lynch's keynote
Consortium update
ASIN, Slavko Manojlovich
ASIN Overview
- 17 atlantic academic libraries
- 300 - 18,000 students
- 2 unilingual francophone sites
- Sirsi, Ex Libris, and Innovative
Why our users hate us
- choose format over subject
- learn multiple database interfaces
- citations presented in confusing formats
Addressing the problems
- a la carte user authentication
- EZProxy servers
- SingleSearch federated search tool over 400 resources (including 100+ open access)
- 1Cate OpenURL resolvers
- Relais ILL
- Refwork/Refshare
Principles
- Click, don't type
- when you have it, show it
- when you don't have it, make it easy to get
- focus on appropriate links rather than click counts
- let the user determine the appropriate copy from the available formats
Stephen Sloan
- Missing ingredient -- enabling subject choice for users, rather than format
- working with SirsiDynix on a consortium version of EPS Rooms CMS
- production version to be available in 1st quarter of 2007
- Rooms is basically a portal environment, with different defaults/scoping for each subject (so that single search
Outstanding challenges
- Federated search connectors based on screen scraping will break
- Citations from certain resources cannot be linked to Resolver
- Cookie pushing in a public environment
- Implementation of the NISO Metasearch standard to improve federated searching
Recognizing our differences
- Local customization of interfaces
- Emulating local default search options--everyone use EBSCO, but everyone has configured different behaviour
- Relying on local expertise at each site
COPPUL, Carmen Kazakoff-Lane
COPPUL Overview
- ANTS: Using Open Source, Social Software (in the COPPUL consortium)
- sharing and updating animated tutorials that were believed to be a better option than long information literacy tutorials
- make it easy to locate and use these tutorials (central location and explicit copyright / reuse statement)
- Make sharing easy and desirable through quality standards, help, and the allowance for local customization
How does it work?
- Project is hosted at http://brandonu.ca/Library/coppul
- ask each institution to take responsibility for a certain set of databases so that they can be updated along with the user interface
- wiki enables institutions to update database list with status of development, whenever they create a tutorial, or add a new database to the list
- rss feeds enable you to track which tutorials have been updated or created
- tutorials are housed within a single institutional repository, licensed under CC licenses with options to the creators
- Other organizations (like LU) are welcome to participate!
Guy Teasdale, Laval Universite
Quebec Digital Infrastructure: The Year in Review
Main players
- BAnQ - Bibliotheque et archives nationales du Quebec
- CREPUQ - Conference of Rectors and Principals of Quebec Universities
- Erudit
- Museums
- Quebec Gov.
- SRC and other media
BAnQ
- BNQ started in 1967
- April 2005, opening of la Grande Bibliotheque of the BNQ
- Jan 2006 - Merger of ANQ and BNQ; mandate to acquire and disseminate collections
- October 2006 - Second meeting on digital national library
- 1996 - beginning of digitization activities
- 2003 - permanent digitation program
- 3.2 million pages of digital materials (newspapers, etc) currently in the collection; 62000 images
Meanwhile in the World
- Dec 2004 - Google print project: 15 M ebooks by 2010
- Jan 2005 - CEO of BND Jeanneney react in Le Monde "Quand Google defie l'Europe", results in the proposal for the Creation of European Digital Library
- 2010 European DL expect 6M books
- Fevrier 2006 Franco network of digital libraries was formed (including France and Quebec)
Meanwhile in Canada
- Quebec is participating in Alouette Canada, hoping that nobody is reinventing the wheel
Erudit
- 18000 scholarly articles from 48 journals
- 150000 backfiles projected
- Erudit schema adopted by www.persee.fr and www.cens.cnrs.fr = franc interoperability
- $3,000 annually to join
OCLC, Thom Hickey
Virtual International Authority File (VIAF)
- Link national authority records
- Build on their authority work
- Move towards universal bibliographic control, while allowing local variations to exist
- Deutsche Nationalbiblithek, LoC, and OCLC -- hoping for the BNF (French) national file
- OCLC is responsible for the actual coding for the project
Matching variations
- In the LCNAF and PND authority files:
- Same name, same person
- Same name, different people
- Different names, same person
- Missing person in one file
Enhancing the authorities
- Bibliographic record -> Derived authority -> Enhanced authority
- Authority record -> Enhanced authority
Weaker attributes
- Only one of birth/death dates
- Subject area of works
- Format
- Language
- Publisher
- Partial title match
Even weaker attributes
- Date of publication
- Country
- Role
- Format
Compute it
- Standard approach:
- Generate keys and data
- Load information into a database
- Index it
- Extract fields needed
- Map/reduce approach (adopted from Google)
- Split the database up
- Run parallel jobs against those pieces of the database
- Bring information together through map/reduce
Map/Reduce
- Map
- Read in source file (e.g MARC21)
- Write out key + data
- Reduce
- Read in array of data for each unique key
- Write out key + data
Map/Reduce implementation
- Written in Python
- Uses ssh and XML-RPC for control and communication
- Map/Reduce seems to add around 10% overhead
- Earlier implementation ran on a 48 CPU cluster
- Current VIAF cluster is a 12 CPU cluster on 4 nodes
- Running Linux and 64-bit Python (no need to worry about 2GB memory limit)
VIAF matching code
- 17 modules
- 1,100 lines of code
- 600 lines of configuration