Thread

  1. Re: Row pattern recognition

    Tatsuo Ishii <ishii@postgresql.org> — 2025-12-01T12:21:38Z

    >> On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii@postgresql.org> wrote:
    >> 
    >> Hi Chao,
    >> 
    >> Any comment on this?
    >> 
    >>>> 13 - 0005 - nodeWindowAgg.c
    >>>> ```
    >>>> static
    >>>> int
    >>>> do_pattern_match(char *pattern, char *encoded_str, int len)
    >>>> {
    >>>> static regex_t *regcache = NULL;
    >>>> static regex_t preg;
    >>>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
    >>>> ```
    >>>> 
    >>>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
    >>> 
    >>> Yeah, such a usage maybe rare. But I hesitate to store the data
    >>> (regex_t) in WindowAggState because:
    >>> 
    >>> (1) it bloats WindowAggState which we want to avoid if possible
    >>> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)
    > 
    > With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that even if I let this pass, other reviewers might have the same concern.
    > 
    > Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only allocate memory when a query needs to.
    > 
    >>> 
    >>>> 14 - 0005 - nodeWindowAgg.c
    >>>> ```
    >>>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
    >>>> +
    >>>> + if (regcache != NULL)
    >>>> + pg_regfree(regcache); /* free previous re */
    >>>> +
    >>>> + /* we need to convert to char to pg_wchar */
    >>>> + plen = strlen(pattern);
    >>>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
    >>>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
    >>>> + /* compile re */
    >>>> + sts = pg_regcomp(&preg, /* compiled re */
    >>>> + data, /* target pattern */
    >>>> + data_len, /* length of pattern */
    >>>> + cflags, /* compile option */
    >>>> + C_COLLATION_OID /* collation */
    >>>> + );
    >>>> + pfree(data);
    >>>> +
    >>>> + MemoryContextSwitchTo(oldContext);
    >>>> ```
    >>>> 
    >>>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
    >>>> ```
    >>>> /*
    >>>> * pg_regcomp - compile regular expression
    >>>> *
    >>>> * Note: on failure, no resources remain allocated, so pg_regfree()
    >>>> * need not be applied to re.
    >>>> */
    >>>> int
    >>>> pg_regcomp(regex_t *re,
    >>>> const chr *string,
    >>>> size_t len,
    >>>> int flags,
    >>>> Oid collation)
    >>>> ```
    >>>> 
    >>>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
    >>> 
    >>> I admit current patch leaves the memory unfreed even after a query
    >>> finishes. What about adding something like:
    >>> 
    >>> static void do_pattern_match_end(void)
    >>> {
    >>> if (regcache != NULL)
    >>> pg_regfree(regcache);
    >>> }
    >>> 
    >>> and let ExecEndWindowAgg() call it?
    >> 
    > 
    > I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext here.
    
    I decided to add new fields to WindowAggState:
    
    	/* regular expression compiled result cache. Used for RPR. */
    	char	   *patbuf;			/* pattern to be compiled */
    	regex_t		preg;			/* compiled re pattern */
    
    Then allocate the memory for them using
    winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory;
    
    Best regards,
    --
    Tatsuo Ishii
    SRA OSS K.K.
    English: http://www.sraoss.co.jp/index_en/
    Japanese:http://www.sraoss.co.jp