Super-alpha MARC package for PHP: comments requested

Posted on Mon 14 August 2006 in Libraries

Okay, I've been working on this project (let's call it PEAR_MARC, although it's not an official PEAR project yet) in my spare moments over the past month or two. It's a new PHP package for working with MARC records. The package tries to follow the PEAR project standards (coding, documentation, error handlers, etc) in the hopes that, when I put a proposal forward, it will be accepted as a true PEAR package. For now, I'm most interested in getting feedback from coders for libraries on the usability of the API that I've designed -- is it easy enough to use and does it offer the functionality that you require for your day-to-day work?

The core MARC decoding routine was taken from the php-marc package that Christoffer Landtman coded for the Emilda open source library management system. The decoding routine was based on the algorithm contained in Perl's MARC::Record package. Christoffer generously relicensed php-marc under an LGPL license so that I could use it as the basis of a (hopefully, eventually) official PEAR package. PEAR_MARC itself will therefore be licensed under LGPL.

Some of the major differences that users will see between php-marc and PEAR_MARC are:

  • System requirements: php-marc requires PHP 4 or 5, while PEAR_MARC requires PHP 5 due to my desire to offer a relatively clean OO structure.
  • API: php-marc was based closely on Perl's MARC::Record API, while with PEAR_MARC I've created a new API from the ground up that is hopefully cleaner, more intuitive where possible, and more transparent where necessary
  • Class hierarchy: php-marc offers one major class that contained all of the methods, with different subclasses used as constructors for different MARC sources (files, strings, YAZ). PEAR_MARC is designed with different classes representing subfields, data fields, control fields, and entire records, with just one class representing all MARC sources.
  • Functionality: php-marc fields and subfields are based on arrays, which results in some interesting limitations (for example, when you add a subfield in php-marc it is always added to the end of the existing subfields). PEAR_MARC is based on a linked list structure, which enables the user to add fields and subfields at any point in the list.
  • Error handling: php-marc implemented warn() and croak() methods as a clone of Perl's API. PEAR::MARC relies on PEAR_ErrorStack as a standard that conforms to the PEAR project requirements.
  • Performance: php-marc offers a class, MARC_Index, that claims to be incredibly fast but which can only be used for read-only operations. I haven't benchmarked PEAR_MARC yet, but I have to assume that it will appear sluggish for read-only operations and would probably lag php-marc in performance for operations that add, modify, or delete subfields. It might be possible to create a class similar to MARC_Index that offers read-only operations with the same API as PEAR_MARC and simply make it an option passed in to the constructor. At this point... not a priority.
  • Tests: PEAR_MARC includes a number of unit tests that have already been helpful in ensuring that the classes operate correctly. php-marc currently offers no unit test suite; however, it is used in production by emilda so it is arguably well-tested in practice :-)

You can find the latest version of the PEAR_MARC package posted at http://marc.coffeecode.net. Please append any comments as replies to this post, or email me at dan@coffeecode.net.