Thread

  1. Re: BUG #6327: Prefix full-text-search fails for hosts with complicated names

    Oleg Bartunov <oleg@sai.msu.su> — 2011-12-06T14:55:12Z

    On Mon, 5 Dec 2011, Tom Lane wrote:
    
    > Marcin.Kasperski@mekk.waw.pl writes:
    >> Synopsis
    >> =========
    >
    >> 'goog:*'  matches  google.com
    >> but
    >> 'e-goog:*' does not match e-google.com
    >
    > The reason for this seems to be that the pattern is treated as a
    > hyphenated word:
    >
    > regression=# select TO_TSQUERY('english', 'e-goog:*');
    >          to_tsquery
    > -------------------------------
    > 'e-goog':* & 'e':* & 'goog':*
    > (1 row)
    >
    > but the hostname isn't:
    >
    > regression=# select TO_TSVECTOR('english', 'See e-google.com');
    >       to_tsvector
    > --------------------------
    > 'e-google.com':2 'see':1
    > (1 row)
    >
    > If you change the text so it's not recognized as a hostname, you get
    > lexemes that would match the query:
    >
    > regression=# select TO_TSVECTOR('english', 'See e-google com');
    >                 to_tsvector
    > ---------------------------------------------
    > 'com':5 'e':3 'e-googl':2 'googl':4 'see':1
    > (1 row)
    >
    > Possibly we could fix this by hacking the ts parser so that it would
    > also apply the hyphenated-word rules to a hostname containing a dash.
    >
    > In general though, there are always going to be cases where prefix
    > match doesn't work because of dictionary transformations ...
    
    I'd index 'after dictionary transformations' lexemes as well as an
    original to let prefix march always work.
    
    >
    > 			regards, tom lane
    >
    
     	Regards,
     		Oleg
    _____________________________________________________________
    Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
    Sternberg Astronomical Institute, Moscow University, Russia
    Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
    phone: +007(495)939-16-83, +007(495)939-23-83