Seek and ye shall find: full-text search in PostgreSQL

Posted on Tue 18 September 2012 in Databases

I'm at PostgresOpen in Chicago, and just gave my talk on Implementing full-text search in PostgreSQL. The goal was to give novice users the understanding and examples they needed to build a workable search solution using PostgreSQL's full-text search. And it went (in my opinion) well - an almost full room, lots of audience interaction (thanks Bruce Momjian, Jonathan Scott, Jonathan Katz, et al) a lot of nodding heads, and nobody running out of the room screaming. So... yay!

A few takeaways from prepping for the presentation:

  • I suspect that some effort on making the full-text search parser extensible would go a long way towards resolving problems that you currently have to work around by manipulating the text before it hits the parser. For example, if you pass in a string like file/path, PostgreSQL classifies that as a file token and stores it as-is - but you might want to be able to search against either "file" or "path" as well as the concatenated form. Right now you have to preparse that string to break it up yourself (via regexp_replace() or the like), but it would be much nicer if you could teach the parser new tricks (without having to modify the source and recompile it).
  • ts_headline() might be a bottleneck for large documents - and (a) solution might be to just bust the document up. *Note to self*: dig into the underlying code to see if there's any chance of using indexes to enable improvement.
  • Ran into a bug with ts_rewrite() while building the tutorial, and have not yet worked out whether that was due to my local configuration or an actual bug... TO DO!

Also - PostgresOpen has had a great vibe so far; a relatively small but very high-quality conference with lots of knowledgeable, friendly participants. Selena (one of the organizers) had a goal of creating an environment similar to PgCon, and I would say from my limited experience attending one PgCon and one PostgresOpen that she and the rest of the conference team have done a great job!