Re: tsearch Parser Hacking

Oleg Bartunov <oleg@sai.msu.su>

From: Oleg Bartunov <oleg@sai.msu.su>
To: "David E. Wheeler" <david@kineticode.com>
Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>
Date: 2011-02-15T07:37:53Z
Lists: pgsql-hackers
David,

it's not easy to hack tsearch parser, sorry. You can preparse your input
before to_tsquery,to_tsvector.

Oleg
On Mon, 14 Feb 2011, David E. Wheeler wrote:

> Hackers,
>
> Is it possible to modify the default tsearch parser so that / doesn't get lexed as a "file" token? That is, instead of this:
>
> try=# select * from ts_debug('simple'::regconfig, 'w/d');
> alias │    description    │ token │ dictionaries │ dictionary │ lexemes
> ───────┼───────────────────┼───────┼──────────────┼────────────┼─────────
> file  │ File or path name │ w/d   │ {simple}     │ simple     │ {w/d}
>
> Ideally it'd think that / was the same as -:
>
> try=# select * from ts_debug('simple'::regconfig, 'w-d');
>      alias      │           description           │ token │ dictionaries │ dictionary │ lexemes
> ─────────────────┼─────────────────────────────────┼───────┼──────────────┼────────────┼─────────
> asciihword      │ Hyphenated word, all ASCII      │ w-d   │ {simple}     │ simple     │ {w-d}
> hword_asciipart │ Hyphenated word part, all ASCII │ w     │ {simple}     │ simple     │ {w}
> blank           │ Space symbols                   │ -     │ {}           │ [null]     │ [null]
> hword_asciipart │ Hyphenated word part, all ASCII │ d     │ {simple}     │ simple     │ {d}
> (4 rows)
>
> Possible? Or would I have to write a completely new parser just to change this bit?
>
> Thanks,
>
> David
>
>
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83