Thread

  1. BUG #18956: Observing an issue in regexp_count()

    PG Bug reporting form <noreply@postgresql.org> — 2025-06-12T08:03:25Z

    The following bug has been logged on the website:
    
    Bug reference:      18956
    Logged by:          Anudeep Vattikonda
    Email address:      anudeepvattikonda0404@gmail.com
    PostgreSQL version: 17.5
    Operating system:   Mac
    Description:        
    
    Hi
    I am trying to run the below query
    select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
    I was expecting it to return 2 but I see Postgres is returning 0. I see that
    there are two matches, cat and flat. All it should do is to look for the
    word at whose left side shoudn't be a word boundary while the right side
    should be a word boundary
    Thank you
    
    
  2. Re: BUG #18956: Observing an issue in regexp_count()

    hubert depesz lubaczewski <depesz@depesz.com> — 2025-06-12T12:10:00Z

    On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:
    > The following bug has been logged on the website:
    > 
    > Bug reference:      18956
    > Logged by:          Anudeep Vattikonda
    > Email address:      anudeepvattikonda0404@gmail.com
    > PostgreSQL version: 17.5
    > Operating system:   Mac
    > Description:        
    > 
    > Hi
    > I am trying to run the below query
    > select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
    > I was expecting it to return 2 but I see Postgres is returning 0. I see that
    > there are two matches, cat and flat. All it should do is to look for the
    > word at whose left side shoudn't be a word boundary while the right side
    > should be a word boundary
    
    What makes you think that \B/\b has anything to do with word boundary?
    
    Docs
    (https://www.postgresql.org/docs/17/functions-matching.html#FUNCTIONS-POSIX-REGEXP)
    show:
    
    \b - backspace, as in C
    \B - synonym for backslash (\) to help reduce the need for backslash doubling
    
    As far as I can tell pg regexps have nothing related to word boundaries.
    
    You could get 2 by changing the regexp to something that actually works:
    
    $ select REGEXP_COUNT('cat at the flat', '[a-z]at(?![a-z])');
     regexp_count
    ──────────────
                2
    (1 row)
    
    Best regards,
    
    depesz
    
    
    
    
    
  3. Re: BUG #18956: Observing an issue in regexp_count()

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-06-12T13:54:46Z

    hubert depesz lubaczewski <depesz@depesz.com> writes:
    > On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:
    >> I am trying to run the below query
    >> select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
    >> I was expecting it to return 2 but I see Postgres is returning 0. I see that
    >> there are two matches, cat and flat. All it should do is to look for the
    >> word at whose left side shoudn't be a word boundary while the right side
    >> should be a word boundary
    
    > What makes you think that \B/\b has anything to do with word boundary?
    
    Indeed, they do not.
    
    > As far as I can tell pg regexps have nothing related to word boundaries.
    
    Sure we do, see "Regular Expression Constraint Escapes":
    
    https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE
    
    Unfortunately, since these are all way outside the POSIX regexp
    standard, different systems have implemented these extensions
    differently.  I don't doubt that \B/\b mean word boundaries
    in some other system.
    
    			regards, tom lane
    
    
    
    
  4. Re: BUG #18956: Observing an issue in regexp_count()

    hubert depesz lubaczewski <depesz@depesz.com> — 2025-06-12T14:05:34Z

    On Thu, Jun 12, 2025 at 09:54:46AM -0400, Tom Lane wrote:
    > hubert depesz lubaczewski <depesz@depesz.com> writes:
    > > On Thu, Jun 12, 2025 at 08:03:25AM +0000, PG Bug reporting form wrote:
    > >> I am trying to run the below query
    > >> select REGEXP_COUNT('cat at the flat', '\Bat\b') ;
    > >> I was expecting it to return 2 but I see Postgres is returning 0. I see that
    > >> there are two matches, cat and flat. All it should do is to look for the
    > >> word at whose left side shoudn't be a word boundary while the right side
    > >> should be a word boundary
    > 
    > > What makes you think that \B/\b has anything to do with word boundary?
    > 
    > Indeed, they do not.
    > 
    > > As far as I can tell pg regexps have nothing related to word boundaries.
    > 
    > Sure we do, see "Regular Expression Constraint Escapes":
    > 
    > https://www.postgresql.org/docs/current/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE
    > 
    > Unfortunately, since these are all way outside the POSIX regexp
    > standard, different systems have implemented these extensions
    > differently.  I don't doubt that \B/\b mean word boundaries
    > in some other system.
    
    Oh, Missed that. Thanks.
    
    So the regexp can be rewritten to:
    
    =$ select REGEXP_COUNT('cat at the flat', '\Yat\M');
     regexp_count
    ──────────────
                2
    (1 row)
    
    Best regards,
    
    depesz