I'm at PostgresOpen in Chicago, and just gave my talk on Implementing full-text search in PostgreSQL. The goal was to give novice users the understanding and examples they needed to build a workable search solution using PostgreSQL's full-text search. And it went (in my opinion) well - an almost full room, lots of audience interaction (thanks Bruce Momjian, Jonathan Scott, Jonathan Katz, et al) a lot of nodding heads, and nobody running out of the room screaming. So... yay!
A few takeaways from prepping for the presentation:
I suspect that some effort on making the full-text search parser extensible would go a long way towards resolving problems that you currently have to work around by manipulating the text before it hits the parser. For example, if you pass in a string like file/path, PostgreSQL classifies that as a file token and stores it as-is - but you might want to be able to search against either "file" or "path" as well as the concatenated form. Right now you have to preparse that string to break it up yourself (via regexp_replace() or the like), but it would be much nicer if you could teach the parser new tricks (without having to modify the source and recompile it).
ts_headline() might be a bottleneck for large documents - and (a) solution might be to just bust the document up. *Note to self*: dig into the underlying code to see if there's any chance of using indexes to enable improvement.
Ran into a bug with ts_rewrite() while building the tutorial, and have not yet worked out whether that was due to my local configuration or an actual bug... TO DO!
Also - PostgresOpen has had a great vibe so far; a relatively small but very high-quality conference with lots of knowledgeable, friendly participants. Selena (one of the organizers) had a goal of creating an environment similar to PgCon, and I would say from my limited experience attending one PgCon and one PostgresOpen that she and the rest of the conference team have done a great job!
Dan, let me know if you find bug in ts_rewrite. We are working on extending GIN, which allows us to greatly improve performance of FTS. Current prototype is more than 10x faster on 6 mln documents (median time is about 8 msec) !
Thanks Oleg. It's nice to hear about the next round of performance enhancements - that sounds like a great win!
The ts_rewrite() problem that I was having is on Fedora 17 with PostgreSQL 9.1.5 from the Fedora packages - when I follow the example from the docs using the aliases table, I get the following error:
ERROR: ts_rewrite query must return two tsquery columns
However, when I paste exactly the same SQL statements to create and run ts_rewrite() against the aliases table on Debian Squeeze with PostgreSQL 9.1.5 from the backports repository, it works as expected.
So - if I can replicate this problem on a clean Fedora instance, I'll file it with the Fedora team. Otherwise, I'll have to explore possible differences in my environment that might trigger the problem.
Figured out the source of the problem - the Evergreen schema still loads the TSearch2 extension, mostly as a matter of habit.
It seems that when the TSearch2 extension is loaded, it introduces a conflict with the operation of ts_rewrite(tsquery, text). Given that full-text search has been integrated in PostgreSQL since 8.3 and 9.2 is out now, it's probably not worth pursuing further.