Evergreen development

OpenSRF

OpenSRF, pronounced “Open Surf”, is the Open Service Request Framework. It was designed as an architecture on which one could easily build scalable applications.

Introduction to OpenSRF

Built on JSON-over-XMPP. XML can be used, but JSON is much less verbose.

OpenSRF offers scalability via its clustering architecture; a service that is a bottleneck can be moved onto its own server; or multiple instances of the service can be run on many servers. Services can themselves be clients of other services.

OpenSRF uses memcached to cache requests and responses for some services. For example, the contents of the configuration files are cached by the opensrf.settings service when that service starts, and if you change a setting in one of those configuration files, you must restart the opensrf.settings service to update its data. You must then restart any of the services that make use of that setting to make the change take effect.

Supports Perl, C, and Python as services and clients, and Java as a client. JavaScript can access services via HTTP translator and gateway. JSON library converts messages to/from native structures for ease of development.

Configuring OpenSRF

Walk through the configuration files, explaining why we put the values into the files that we do:

opensrf_core.xml
- Distinguish between public and private services for security of Web-based applications.
- Deprecated HTTP gateway versus OpenSRF-over-HTTP
opensrf.xml

Tip	In a clustered OpenSRF instance, these files are normally hosted on a network share so that each member of the cluster can read them.

Starting OpenSRF services

Note	I won't go through this during a live session. Perhaps I can cut this out entirely…

Issue the following commands as the opensrf user. If you are running OpenSRF on a single-server machine, you can use the -l flag to force the hostname to be treated as localhost.

Start the OpenSRF router:
```
osrf_ctl.sh -a start_router
```
Important
The router must only run on a single machine in a given brick.
Start all OpenSRF Perl services defined for a given hostname:
```
osrf_ctl.sh -a start_perl
```
Tip
You can start an individual Perl service using:
```
opensrf-perl.pl -s <service-name> -a start -p <PID-directory>
```
Start all OpenSRF C services defined for a given hostname:
```
osrf_ctl.sh -a start_c
```

Stopping OpenSRF services

Issue the following commands as the opensrf user. If you are running OpenSRF on a single-server machine, you can use the -l flag to force the hostname to be treated as localhost.

Stop the OpenSRF router:
```
osrf_ctl.sh -a stop_router
```
Stop all OpenSRF Perl services defined for a given hostname:
```
osrf_ctl.sh -a stop_perl
```
Tip
You can stop an individual Perl service using:
```
opensrf-perl.pl -s <service-name> -a stop -p <PID-directory>
```
Stop all OpenSRF C services defined for a given hostname:
```
osrf_ctl.sh -a stop_c
```

Important

PID files for OpenSRF services are stored and looked up in /openils/var/run by default with osrf_ctl.sh, and in /tmp/ with opensrf-perl.pl. For a clustered server instance of Evergreen, you must store the PIDs on a directory that is local to each server, or else one of your cluster servers may try killing processes on itself that actually have PIDs on other servers.

Examining sample code

Show internal documentation for methods. Do some stupid srfsh tricks (introspect for one) and show docgen.xsl in action.

Show how methods are registered, along with global initialization, child initialization, and associated destructors. OpenSRF::UnixServer is where we want to look to see how these optional methods are invoked in Perl.

Note that arguments are converted between native data structures and JSON for us for free.

Perl

Services

See OpenSRF/src/perl/lib/OpenSRF/UnixServer.pm to understand how the optional methods for initializing and cleaning up OpenSRF services are invoked:

initialize()
child_init()
child_exit()

Services are implemented as Perl functions. Each service needs to be registered with:

__PACKAGE__->register_method(
  method => 'method name',
  api_name => 'api name',
  api_level => 1,
  argc => # of args,
  signature => {
    desc => “Description”,
    params => [
      {
        name => 'parameter name',
        desc => 'parameter description',
        type => '(array|hash|number|string)'
      }
    ],
    return => {
      desc => 'Description of return value',
      type => '(array|hash|number|string)'
    }
  }
);

Client cheat sheet

This is the simplest possible OpenSRF client written in Perl:

use OpenSRF::System; # 1
OpenSRF::System->bootstrap_client(config_file => $ARGV[0]); # 2
my $session = OpenSRF::AppSession->create("open-ils.resolver"); # 3
my $request = $session->request("open-ils.resolver.resolve_holdings", "issn", "0022-362X" )->gather(); # 4
$session->disconnect(); # 5

The OpenSRF::System module gives our program access to the core OpenSRF client functionality.
The bootstrap_client() method reads the opensrf_core.xml file and sets up communication with the OpenSRF router.
The OpenSRF::Appsession→create() instance method asks the router if it can connect to the named service. If the router determines that the service is accessible (either the opensrf credentials are on the private domain, which gives it access to all public and private services; or the service is on a public domain, which is accessible to both public and private opensrf credentials), it returns an OpenSRF session with a connection to the named service.
The OpenSRF::Appsession→request() method invokes a method of the associated service to return a request object. Invoking the gather() method on the returned request object returns a single result.

Note
If the service is expected to return multiple results, you should loop over it with recv() instead. But then, that wouldn't be the simplest possible client anymore would it?
The OpenSRF::Appsession→disconnect() instance method disconnects from the service, enabling that child to go on and handle other requests.

Python

See OpenSRF/src/python/osrf/apps/example.py

The optional methods for initializing and cleaning up are:

global_init()
child_init()
child_exit()

JavaScript

Exercise

Build a new OpenSRF service.

Camel folk (Perl)

The challenge: implement a service that caches responses from some other Web service (potentially cutting down on client-side latency for something like OpenLibrary / Google Books / xISBN services, and avoiding timeouts if the target service is not dependable). Our example will be to build an SFX lookup service. This has the additional advantage of enabling XmlHttpRequest from JavaScript by hosting the services on the same domain.

Let's start with the simplest possible implementation – a CGI script.

#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use LWP::UserAgent;
use XML::LibXML;
use JSON::XS;

my $q = CGI->new;

my $issn = $q->param("issn");
my $isbn = $q->param("isbn");

my $url_base = 'http://sfx.scholarsportal.info/laurentian';

my $url_args = '?url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx' .
        '&ctx_enc=UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/conifer' .
        '&sfx.ignore_date_threshold=1&sfx.response_type=multi_obj_detailed_xml' .
        '&__service_type=getFullTxt';

if ($issn) {
        $url_args .= "&rft.issn=$issn";
} elsif ($isbn) {
        $url_args .= "&rft.isbn=$isbn";
}

my $ua = LWP::UserAgent->new;
$ua->agent("SameOrigin/1.0");

my $req = HTTP::Request->new(GET => "$url_base$url_args");
my $res = $ua->request($req);

print $q->header('text/json');
my $xml = $res->content;
my $parser = XML::LibXML->new();
my $parsed_sfx = $parser->parse_string($xml);

my (@targets) = $parsed_sfx->findnodes('//target');

my @sfx_result;
foreach my $target (@targets) {
        my $public_name = $target->findvalue('./target_public_name');
        my $target_url = $target->findvalue('.//target_url');
        my $target_coverage = $target->findvalue('.//coverage_statement');
        my $target_embargo = $target->findvalue('.//embargo_statement');
        push @sfx_result, {
                public_name => $public_name,
                coverage => $target_coverage,
                embargo => $target_embargo,
                url => $target_url
        };
}

print encode_json(\@sfx_result);

Hopefully you can follow what this CGI script is doing. It works, but it has all the disadvantages of CGI: the environment needs to be built up on every request, and it doesn't remember anything from the previous times it was called, etc.

Turning the CGI script into an OpenSRF service

So now we want to turn this into an OpenSRF service. Start by ripping out the CGI stuff, as we won't need that any more.

To turn this into an OpenSRF service, we create a new Perl module (OpenILS::Application::ResolverResolver). We no longer have to convert results between Perl and JSON values, as OpenSRF will handle that for us. We now have to register the method with OpenSRF.

Example: Basic OpenSRF Resolver service

#!/usr/bin/perl

# Copyright (C) 2009 Dan Scott 

# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

=head1 NAME

OpenILS::Application::ResolverResolver - retrieves holdings from OpenURL resolvers

=head1 SYNOPSIS

Via srfsh:
  request open-ils.resolver open-ils.resolver.resolve_holdings "issn", "0022-362X"

Via Perl:
  my $session = OpenSRF::AppSession->create("open-ils.resolver");
  my $request = $session->request("open-ils.resolver.resolve_holdings", [ "issn", "0022-362X" ] )->gather();
  $session->disconnect();

  # $request is a reference to the list of hashes

=head1 DESCRIPTION

OpenILS::Application::ResolverResolver caches responses from OpenURL resolvers
to requests for full-text holdings. Currently integration with SFX is supported.

Each org_unit can specify a different base URL as the third argument to
resolve_holdings(). Eventually org_units will have org_unit settings to hold
their resolver type and base URL.

=head1 AUTHOR

Dan Scott, dscott@laurentian.ca

=cut

package OpenILS::Application::ResolverResolver;

use strict;
use warnings;
use LWP::UserAgent;
use XML::LibXML;

# All OpenSRF applications must be based on OpenSRF::Application or
# a subclass thereof.  Makes sense, eh?
use OpenILS::Application;
use base qw/OpenILS::Application/;

# This is the client class, used for connecting to open-ils.storage
use OpenSRF::AppSession;

# ... and here we have the built in logging helper ...
use OpenSRF::Utils::Logger qw($logger);

our ($ua, $parser);

sub child_init {
        # We need a User Agent to speak to the SFX beast
        $ua = new LWP::UserAgent;
        $ua->agent('SameOrigin/1.0');

        # SFX returns XML to us; let us parse
        $parser = new XML::LibXML;
}

sub resolve_holdings {
        my $self = shift;
        my $conn = shift;
        my $id_type = shift; # keep it simple for now, either 'issn' or 'isbn'
        my $id_value = shift; # the normalized ISSN or ISBN

        # For now we'll pass the argument with a hard-coded default
        # Should pull these specifics from the database as part of initialize()
        my $url_base = shift || 'http://sfx.scholarsportal.info/laurentian';

        # Big ugly SFX OpenURL request
        my $url_args = '?url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&'
                . 'ctx_enc=UTF-8&ctx_ver=Z39.88-2004&rfr_id=info:sid/conifer&'
                . 'sfx.ignore_date_threshold=1&'
                . 'sfx.response_type=multi_obj_detailed_xml&__service_type=getFullTxt';

        if ($id_type = 'issn') {
                $url_args .= "&rft.issn=$id_value";
        } elsif ($id_type = 'isbn') {
                $url_args .= "&rft.isbn=$id_value";
        }

        # Otherwise, let's go and grab the info from the SFX server
        my $req = HTTP::Request->new('GET', "$url_base$url_args");

        # Let's see what we we're trying to request
        $logger->info("Resolving the following request: $url_base$url_args");

        my $res = $ua->request($req);

        my $xml = $res->content;
        my $parsed_sfx = $parser->parse_string($xml);

        my (@targets) = $parsed_sfx->findnodes('//target');

        my @sfx_result;
        foreach my $target (@targets) {
                my $public_name = $target->findvalue('./target_public_name');
                my $target_url = $target->findvalue('.//target_url');
                my $target_coverage = $target->findvalue('.//coverage_statement');
                my $target_embargo = $target->findvalue('.//embargo_statement');
                push @sfx_result, {public_name => $public_name, coverage => $target_coverage, embargo => $target_embargo, url => $target_url};
        }

        return \@sfx_result;
}

__PACKAGE__->register_method(
        method    => 'resolve_holdings',
        api_name  => 'open-ils.resolver.resolve_holdings',
        api_level => 1,
        argc      => 3,
        signature =>
                { desc     => <<"                 DESC",
Returns a list of the full-text holdings for a given ISBN or ISSN
                  DESC
                  'params' =>
                        [
                                { name => 'id_type',
                                  desc => 'The type of identifier ("issn" or "isbn")',
                                  type => 'string' },
                                { name => 'id_value',
                                  desc => 'The identifier value',
                                  type => 'string' },
                                { name => 'url_base',
                                  desc => 'The base URL for the resolver and instance',
                                  type => 'string' },
                        ],
                  'return' =>
                        { desc => 'Returns a list of the full-text holdings for a given ISBN or ISSN',
                          type => 'array' }
                }
);

1;

Add the service to opensrf.xml so it gets started with the other Perl services on our host of choice:

...
<open-ils.resolver>
   <keepalive>3</keepalive>
   <stateless>1</stateless>
   <language>perl</language>
   <implementation>OpenILS::Application::ResolverResolver</implementation>
   <max_requests>17</max_requests>
   <unix_config>
     <unix_sock>open-ils.resolver_unix.sock</unix_sock>
     <unix_pid>open-ils.resolver_unix.pid</unix_pid>
     <max_requests>1000</max_requests>
     <unix_log>open-ils.resolver_unix.log</unix_log>
     <min_children>5</min_children>
     <max_children>15</max_children>
     <min_spare_children>3</min_spare_children>
     <max_spare_children>5</max_spare_children>
   </unix_config>
</open-ils.resolver>
...
<!-- In the <hosts> section -->
<localhost>
  ...
  <appname>open-ils.resolver</appname>
</localhost>

And add the service to opensrf_core.xml as a publicly exposed service via the HTTP gateway and translator:

...
<!-- In the public router section -->
<services>
  ...
  <service>open-ils.resolver</service>
</services>
...
<!-- In the public gateway section -->
<services>
<gateway>
  ...
  <services>
    <service>open-ils.resolver</service>
  </services>
</gateway>

Add caching

To really make this service useful, we can take advantage of OpenSRF's built-in support for caching via memcached. Keeping the values returned by the resolver for 1 week is apparently good.

We will also take advantage of the opensrf.settings service that holds the values defined in the opensrf.xml configuration file to supply some of our default arguments.

Example: Caching OpenSRF Resolver Service

--- ResolverResolver.pm.basic   2009-10-22 16:52:55.000000000 -0400
+++ ResolverResolver.pm 2009-10-22 16:56:42.000000000 -0400
@@ -62,11 +62,32 @@
 # This is the client class, used for connecting to open-ils.storage
 use OpenSRF::AppSession;

+# This is an extension of Error.pm that supplies some error types to throw
+use OpenSRF::EX qw(:try);
+
+# This is a helper class for querying the OpenSRF Settings application ...
+use OpenSRF::Utils::SettingsClient;
+
 # ... and here we have the built in logging helper ...
 use OpenSRF::Utils::Logger qw($logger);

+# ... and this manages cached results for us ...
+use OpenSRF::Utils::Cache;
+
+my $prefix = "open-ils.resolver_"; # Prefix for caching values
+my $cache;
+my $cache_timeout;
+
 our ($ua, $parser);

+
+sub initialize {
+       $cache = OpenSRF::Utils::Cache->new('global');
+       my $sclient = OpenSRF::Utils::SettingsClient->new();
+       $cache_timeout = $sclient->config_value(
+                       "apps", "open-ils.resolver", "app_settings", "cache_timeout" ) || 300;
+}
+
 sub child_init {
        # We need a User Agent to speak to the SFX beast
        $ua = new LWP::UserAgent;
@@ -82,6 +103,9 @@
        my $id_type = shift; # keep it simple for now, either 'issn' or 'isbn'
        my $id_value = shift; # the normalized ISSN or ISBN

+       # We'll use this in our cache key
+       my $method = "open-ils.resolver.resolve_holdings";
+
        # For now we'll pass the argument with a hard-coded default
         # Should pull these specifics from the database as part of initialize()
        my $url_base = shift || 'http://sfx.scholarsportal.info/laurentian';
@@ -98,6 +122,16 @@
                $url_args .= "&rft.isbn=$id_value";
        }

+       my $ckey = $prefix . $method . $url_base . $id_type . $id_value;
+
+       # Check the cache to see if we've already looked this up
+       # If we have, shortcut our return value
+       my $result = $cache->get_cache($ckey) || undef;
+       if ($result) {
+               $logger->info("Resolver found a cache hit");
+               return $result;
+       }
+
        # Otherwise, let's go and grab the info from the SFX server
        my $req = HTTP::Request->new('GET', "$url_base$url_args");

@@ -120,6 +154,9 @@
                push @sfx_result, {public_name => $public_name, coverage => $target_coverage, embargo => $target_embargo, url => $target_url};
        }

+       # Stuff this into the cache
+       $cache->put_cache($ckey, \@sfx_result, $cache_timeout);
+
        return \@sfx_result;
 }

@@ -150,4 +187,6 @@
                }
 );

+# Add methods to clear cache for specific lookups?
+
 1;

Next step: add org_unit settings for resolver type and URL on a per-org_unit basis. OrgUnit settings can be retrieved via OpenILS::Application::AppUtils→ou_ancestor_setting_value($org_id, $setting_name)).

This is where we step beyond OpenSRF and start getting into the Evergreen database schema (config.org_unit_setting table).

Pythonistas

We want to implement spelling suggestions based on data from our catalogue. Currently we throw terms at the aspell library to get spelling suggestions, but this has the limitation of not knowing whether the suggested term will actually result in a hit. If we were to instead build a corpus of terms from our database and then generate spelling suggestions based on that, using something like http://www.peterbe.com/plog/spellcorrector-0.2 we could provide more useful spelling suggestions.

Could then be modified to include title / author / subject specific subsets, which would be kind of cool.

Should also differentiate term corpus by org_unit hierarchy, but let's not get crazy.

Note that there is (I believe) a C implementation of the same library, so we could deploy an optimized version of this service relatively easily.

Database schema

The database schema is tied pretty tightly to PostgreSQL. Although PostgreSQL adheres closely to ANSI SQL standards, the use of schemas, SQL functions implemented in both plpgsql and plperl, and PostgreSQL's native full-text search would make it… challenging… to port to other database platforms.

A few common PostgreSQL interfaces for poking around the schema and manipulating data are:

psql (the command line client)
pgadminIII (a GUI client).

Or you can read through the source files in Open-ILS/src/sql/Pg.

Let's take a quick tour through the schemas, pointing out some highlights and some key interdependencies:

actor.org_unit → asset.copy_location
actor.usr → actor.card
biblio.record_entry → asset.call_number → asset.copy
config.metabib_field → metabib.*_field_entry

Database access methods

You could use direct access to the database via Perl DBI, JDBC, etc, but Evergreen offers several database CRUD services for creating / retrieving / updating / deleting data. These avoid tying you too tightly to the current database schema and they funnel database access through the same mechanism, rather than tying up connections with other interfaces.

Evergreen Interface Definition Language (IDL)

Defines properties and required permissions for Evergreen classes. To reduce network overhead, a given object is identified via a class-hint and serialized as a JSON hash of properties (no named properties).

As of 1.6, fields will be serialized in the order in which they appear in the IDL definition file, and the is_new / is_changed / is_deleted properties are automatically added. This has greatly reduced the size of the fm_IDL.xml file and makes DRY people happier :)

Each class element has a class_hint, 0 or more controllers, an optional
Linked fields can be fleshed inline
permacrud section defines any create / retrieve / update / delete permissions to apply to a given class, but must have open-ils.pcrud defined as a controller or permissions will be ignored
further, the field(s) on that particular class that identify the library (actor.org_unit) where the user must have the defined permission are defined by a context_field attribute on the link element and a child context element with the link attribute pointing to the linked class and a field attribute
see atc for an example of both context_field and context in a single permission
see acqpca for an example of a jump attribute (for a link to a link, it seems)
reporter:label attributes are used in the reporter interface to give meaningful labels to classes and fields
oils_persist:tablename attribute defines the schema_name.table_name in the database from which the values should be pulled … if it lives in the database
… oils_persist:virtual tells us, if true, that the data doesn't live in the database, but is served up via an OpenSRF method instead (e.g. mvr, mups)
… oils_persist:readonly tells us, if true, that the data lives in the database, but is pulled from the SELECT statement defined in the <oils_persist:source_definition> child element