Update: 2010-03-18 - I just realized that now that I have a separate keyword index for titles, I can assign a stronger boost in general to keywords that appear in the title for a keyword search; see update: index weight below.
One of my colleagues just asked me:
So, how does relevancy ranking work in Evergreen, anyway?
I've been poking around in the area recently, as one of our users complained about the relevance of some results with a basic keyword search, so I thought I would throw my thoughts out there. It might give other people a good jumping off point, and it provides a bit more of an answer to questions like these on the Evergreen mailing lists. There are a number of factors, but cover density plays a significant role - how often the terms you're looking for appear within the target index, where index = keyword, author, title, subject, or series (at least, those are the indexes that Evergreen supplies you with out of the box). Then there are a number of tweakable boosts that appear in the search.relevance_ranking table:
- full_match: for an exact match of the terms you're looking for, from beginning to end, in the target index
- first_word: for a match of the first search term with the first term in the target index
- word_order: for a match between the order of the search terms and the order of the terms in the target index
The problem with searching the out of the box "keyword" index is that there's no way of boosting the ranking for terms appearing in, say, the title or subject, because out of the box there's just one keyword|keyword index. For a keyword search, you can't tell Evergreen that terms that appear in the title should be more relevant than terms that appear in something like the content notes. In comparison, the title index is actually composed of a number of separate indexes: title|proper, title|uniform, title|alternative, title|translated, etc, that collectively form the title index. You can see this in the config.metabib_field table.
Given some relatively horrible results for a keyword search like "programming languages" that returns Regular expression recipes for Windows developers as the most relevant hit (are you kidding me? No, it's because "Programming languages" appears in the subjects about 10 times... sigh), on our test server I added a keyword|title index that is identical to the title|proper index, and then added some entries to the search.relevance_adjustment table to modify the relevancy ranking accordingly, as follows:
-- Clone the title|proper index to create a keyword|title index-- 6 = the title|proper indexINSERT INTO config.metabib_field (field_class, name, xpath, weight, format, search_field, facet_field) SELECT 'keyword', 'title', xpath, weight, format, search_field, facet_field FROM config.metabib_field WHERE id = 6;-- Populate the keyword|title index with a set of index entries cloned-- from the metabib.title_field_entry table;-- 6 = the title|proper indexINSERT INTO metabib.keyword_field_entry (source, field, value) SELECT source, 17, value FROM metabib.title_field_entry WHERE field = 6;-- Bump the relevance when the first search term appears first in the title in a keyword search-- 17 = our new keyword|title indexINSERT INTO search.relevance_adjustment (active, field, bump_type, multiplier) VALUES (true, 17, 'first_word', 5);
It feels dirty, because we're creating such a massively duplicated set of rows. But it works... at least the first_word relevance adjustment works. When I tried using a multiplier of 1000 for the word_order relevance adjustment, it did not affect the search results in the least. Perhaps there's a bug there?
In any case, by combining some of the findings of this post with my previous post on adding more granular indexes, perhaps this will help people get deeper into customizing the search experience for their Evergreen installations.
` <>`__Update: adjusting search weight of terms in title in general: So, now that we have the keyword|title index, we can boost the relevancy ranking for records in which the search terms appear in the keyword|title index rather than the general keyword|keyword index. Here's how to shake things up:
-- Boost the relevance for search terms appearing in the title in general-- 17 = our new keyword|title indexUPDATE config.metabib_field SET weight = 10 WHERE id = 17;
Some quick testing suggests that a weight of 10 works reasonably well... but that is obviously going to be subject to further testing and tweaking. But hey: we have the ability to tweak now! Yay!