Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Adjust MemSet macro to use size_t rather than long

  2. Get rid of long datatype in CATCACHE_STATS enabled builds

  1. Some efforts to get rid of "long" in our codebase

    David Rowley <dgrowleyml@gmail.com> — 2025-11-06T11:46:56Z

    I was playing around with the following imperfect regex to try and get
    an idea of how many "long" usages we have.
    
    git grep -E "^\s*(static)?\s*(unsigned)?\s*long\b" -- '*.c' '*.h' | wc -l
    
    REL_13_STABLE -> 486
    REL_14_STABLE -> 482
    REL_15_STABLE -> 476
    REL_16_STABLE -> 485
    REL_17_STABLE -> 449
    REL_18_STABLE -> 439
    master -> 386
    
    (Generally trending downwards with quite a good reduction since v18)
    
    Many of the "long" usages are genuine, e.g for compatibility with
    library functions, or in snprintf.c for the %ld and %lu format
    specifiers; those we need. However, many of these we don't need. Many
    are a possible source of bugs due to not all platforms we support
    having the same idea about how wide longs are. We've had and fixed
    bugs around this before.
    
    I've attached a couple of patches to get the ball rolling.
    
    0001: CATCACHE_STATS is using signed longs as counters for cache
    hits/misses etc. I've changed this to uint64 rather than int64 as I
    didn't see the need to have a signed type to count the occurrences of
    an event. Maybe there's an anti-universe where negative occurrences
    are a thing, but they're not in this one. If something goes awry with
    the counters and that causes the subtraction that's being done to
    wrap, then we're more likely to notice the bug via the excessively
    large number the wrap would end up displaying.
    
    0002: MemSet / MemSetAligned macros. It's probably about time someone
    made these disappear, but that's likely for another thread with more
    research than I'd like to do here. I replaced "long" with "Size". I
    also considered "uintptr_t", but after some reading of the C standard,
    I settled on Size as it seems it's possible for platforms to exist
    where the pointer width is smaller than the processor's width. I
    suspect it might not matter for the platforms we support? Size could
    also be smaller than the processor's width, but I feel that's less
    likely.
    
    Which yields a drop in the Ocean fewer longs...
    
    0001+0002 -> 367
    
    David
    
  2. Re: Some efforts to get rid of "long" in our codebase

    Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-06T12:17:01Z

    On 06/11/2025 13:46, David Rowley wrote:
    > I've attached a couple of patches to get the ball rolling.
    
    Thanks!
    
    > 0001: CATCACHE_STATS is using signed longs as counters for cache
    > hits/misses etc. I've changed this to uint64 rather than int64 as I
    > didn't see the need to have a signed type to count the occurrences of
    > an event. Maybe there's an anti-universe where negative occurrences
    > are a thing, but they're not in this one. If something goes awry with
    > the counters and that causes the subtraction that's being done to
    > wrap, then we're more likely to notice the bug via the excessively
    > large number the wrap would end up displaying.
    
    Unsigned makes sense for these.
    
    > @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
    >  
    >  		if (cache->cc_ntup == 0 && cache->cc_searches == 0)
    >  			continue;			/* don't print unused caches */
    > -		elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits",
    > +		elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" PRIu64 " lhits",
    >  			 cache->cc_relname,
    >  			 cache->cc_indexoid,
    >  			 cache->cc_ntup,
    
    Unfortunately PRIu64 makes these much longer and less readable. I don't 
    think there's much we can do about that though. Perhaps split the format 
    string to multiple lines?
    
    > 0002: MemSet / MemSetAligned macros. It's probably about time someone
    > made these disappear, but that's likely for another thread with more
    > research than I'd like to do here. I replaced "long" with "Size". I
    > also considered "uintptr_t", but after some reading of the C standard,
    > I settled on Size as it seems it's possible for platforms to exist
    > where the pointer width is smaller than the processor's width. I
    > suspect it might not matter for the platforms we support? Size could
    > also be smaller than the processor's width, but I feel that's less
    > likely.
    
    Let's use size_t instead of Size in new code, per previous discussions 
    (https://www.postgresql.org/message-id/flat/CA%2BRLCQyVqb9Xxni6x2iJYTawMbJv5gZY2fzNaw39%3D_yOtu_QKw%40mail.gmail.com#74391f75a649df13920f87360e361183)
    
    - Heikki
    
    
    
    
    
  3. Re: Some efforts to get rid of "long" in our codebase

    Peter Eisentraut <peter@eisentraut.org> — 2025-11-06T18:26:38Z

    On 06.11.25 13:17, Heikki Linnakangas wrote:
    >> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
    >>
    >>          if (cache->cc_ntup == 0 && cache->cc_searches == 0)
    >>              continue;            /* don't print unused caches */
    >> -        elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld 
    >> hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits",
    >> +        elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %" 
    >> PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%" 
    >> PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %" 
    >> PRIu64 " lhits",
    >>               cache->cc_relname,
    >>               cache->cc_indexoid,
    >>               cache->cc_ntup,
    > 
    > Unfortunately PRIu64 makes these much longer and less readable. I don't 
    > think there's much we can do about that though. Perhaps split the format 
    > string to multiple lines?
    
    You could also use unsigned long long int for these, to make the format 
    strings more readable.
    
    
    
    
    
  4. Re: Some efforts to get rid of "long" in our codebase

    Peter Eisentraut <peter@eisentraut.org> — 2025-11-06T18:33:12Z

    On 06.11.25 12:46, David Rowley wrote:
    > 0002: MemSet / MemSetAligned macros. It's probably about time someone
    > made these disappear, but that's likely for another thread with more
    > research than I'd like to do here. I replaced "long" with "Size". I
    > also considered "uintptr_t", but after some reading of the C standard,
    > I settled on Size as it seems it's possible for platforms to exist
    > where the pointer width is smaller than the processor's width. I
    > suspect it might not matter for the platforms we support? Size could
    > also be smaller than the processor's width, but I feel that's less
    > likely.
    
    I think size_t/Size could be misleading here.  You're not measuring any 
    size, you're just chunking up the bytes to zero into something that we 
    thing the compiler or CPU can handle very efficiently.
    
    So in a sense, using long isn't wrong here.  It might well be the best 
    for this.  If there is an aversion to using any long at all, why not 
    long long.
    
    
    
    
  5. Re: Some efforts to get rid of "long" in our codebase

    David Rowley <dgrowleyml@gmail.com> — 2025-11-06T20:06:42Z

    On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
    >
    > On 06.11.25 12:46, David Rowley wrote:
    > > 0002: MemSet / MemSetAligned macros. It's probably about time someone
    > > made these disappear, but that's likely for another thread with more
    > > research than I'd like to do here. I replaced "long" with "Size". I
    > > also considered "uintptr_t", but after some reading of the C standard,
    > > I settled on Size as it seems it's possible for platforms to exist
    > > where the pointer width is smaller than the processor's width. I
    > > suspect it might not matter for the platforms we support? Size could
    > > also be smaller than the processor's width, but I feel that's less
    > > likely.
    >
    > I think size_t/Size could be misleading here.  You're not measuring any
    > size, you're just chunking up the bytes to zero into something that we
    > thing the compiler or CPU can handle very efficiently.
    >
    > So in a sense, using long isn't wrong here.  It might well be the best
    > for this.  If there is an aversion to using any long at all, why not
    > long long.
    
    The point in changing this was that long isn't a good fit as it's not
    64-bit on 64-bit Windows. My aim is to find a type with a width the
    same as the processor's word size. long long tends to be 64-bit on
    32-bit platforms, so doesn't seem a very good fit. Someone might argue
    that doesn't matter now since we no longer have 4-byte Datums, but
    IMO, this isn't any reason to make things even worse for 32-bit
    builds.
    
    David
    
    
    
    
  6. Re: Some efforts to get rid of "long" in our codebase

    Peter Eisentraut <peter@eisentraut.org> — 2025-11-07T15:53:01Z

    On 06.11.25 21:06, David Rowley wrote:
    > On Fri, 7 Nov 2025 at 07:33, Peter Eisentraut <peter@eisentraut.org> wrote:
    >>
    >> On 06.11.25 12:46, David Rowley wrote:
    >>> 0002: MemSet / MemSetAligned macros. It's probably about time someone
    >>> made these disappear, but that's likely for another thread with more
    >>> research than I'd like to do here. I replaced "long" with "Size". I
    >>> also considered "uintptr_t", but after some reading of the C standard,
    >>> I settled on Size as it seems it's possible for platforms to exist
    >>> where the pointer width is smaller than the processor's width. I
    >>> suspect it might not matter for the platforms we support? Size could
    >>> also be smaller than the processor's width, but I feel that's less
    >>> likely.
    >>
    >> I think size_t/Size could be misleading here.  You're not measuring any
    >> size, you're just chunking up the bytes to zero into something that we
    >> thing the compiler or CPU can handle very efficiently.
    >>
    >> So in a sense, using long isn't wrong here.  It might well be the best
    >> for this.  If there is an aversion to using any long at all, why not
    >> long long.
    > 
    > The point in changing this was that long isn't a good fit as it's not
    > 64-bit on 64-bit Windows. My aim is to find a type with a width the
    > same as the processor's word size. long long tends to be 64-bit on
    > 32-bit platforms, so doesn't seem a very good fit. Someone might argue
    > that doesn't matter now since we no longer have 4-byte Datums, but
    > IMO, this isn't any reason to make things even worse for 32-bit
    > builds.
    
    Yeah you're right.  If we cared a lot about MemSet's longevity, I would 
    suggest making a new type for this, but that would leak into all users 
    of c.h, so maybe not.
    
    I do suggest some kind of comment, that we're using size_t as a 
    convenient proxy for the most suitable chunk/step size for the platform, 
    not to actually measure a size.
    
    
    
    
    
  7. Re: Some efforts to get rid of "long" in our codebase

    Tom Lane <tgl@sss.pgh.pa.us> — 2025-11-07T16:04:11Z

    Peter Eisentraut <peter@eisentraut.org> writes:
    > I do suggest some kind of comment, that we're using size_t as a 
    > convenient proxy for the most suitable chunk/step size for the platform, 
    > not to actually measure a size.
    
    Yeah.  There's not actually anything wrong with using "long" here,
    except that we believe that on Win64 it's probably not the processor's
    native word width.  Size/size_t is more likely to match that.
    
    ([u]intptr_t is even more likely to match, but I don't think that's a
    good choice because it invites confusion with the usage of uintptr_t
    for address arithmetic elsewhere in the macro.  That's quite unrelated
    to the choice of copy step size.)
    
    			regards, tom lane
    
    
    
    
  8. Re: Some efforts to get rid of "long" in our codebase

    David Rowley <dgrowleyml@gmail.com> — 2025-11-10T00:53:36Z

    On Fri, 7 Nov 2025 at 07:26, Peter Eisentraut <peter@eisentraut.org> wrote:
    >
    > On 06.11.25 13:17, Heikki Linnakangas wrote:
    > >> @@ -476,7 +476,7 @@ CatCachePrintStats(int code, Datum arg)
    > >>
    > >>          if (cache->cc_ntup == 0 && cache->cc_searches == 0)
    > >>              continue;            /* don't print unused caches */
    > >> -        elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld
    > >> hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits",
    > >> +        elog(DEBUG2, "catcache %s/%u: %d tup, %" PRIu64 " srch, %"
    > >> PRIu64 "+%" PRIu64 "=%" PRIu64 " hits, %" PRIu64 "+%" PRIu64 "=%"
    > >> PRIu64 " loads, %" PRIu64 " invals, %d lists, %" PRIu64 " lsrch, %"
    > >> PRIu64 " lhits",
    > >>               cache->cc_relname,
    > >>               cache->cc_indexoid,
    > >>               cache->cc_ntup,
    > >
    > > Unfortunately PRIu64 makes these much longer and less readable. I don't
    > > think there's much we can do about that though. Perhaps split the format
    > > string to multiple lines?
    >
    > You could also use unsigned long long int for these, to make the format
    > strings more readable.
    
    I couldn't really decide on what's best here. I'd personally rather be
    more explicit about the width of the type by using uint64, but on the
    other hand, I don't know of any realistic modern hardware/compiler
    combination where unsigned long long isn't 64-bit. Certainly, the
    format string is easier to read with %llu.
    
    v2-0001 wraps the format string as suggested by Heikki, v3-0001 uses
    unsigned long long as suggested by Peter.
    
    v2-0002 is updated to use size_t instead of Size, per Heikki
    
    Any further opinions or votes on v2-0001 vs v3-0001?
    
    David
    
  9. Re: Some efforts to get rid of "long" in our codebase

    Andreas Karlsson <andreas@proxel.se> — 2025-11-10T08:00:40Z

    On 11/6/25 12:46 PM, David Rowley wrote:
    > I've attached a couple of patches to get the ball rolling.
    
    This inspired me to write my own patch to chip away at the use of long.
    
    The target for the attached patch is TimestampDifference() and 
    feTimestampDifference() which return the timestamp difference in seconds 
    and microseconds as a long and an int and replace it with in64 and 
    int32. The return values of these functions are used a lot for printing 
    durations to the log and sometimes to populate a struct timeval (in that 
    case returning a timeval would make some sense but time_t is sadly 32 
    bits on some platforms).
    
    The patch also cleans up code a bit in check_log_duration() which was 
    unnecessarily complicated.
    
    But writing the mail made me wonder if a cleaner solution wouldn't be to 
    just make TimestampDifference() return a TimeOffset and then write 
    helper functions to e.g. convert a TimeOffset to a timeval or to 
    milliseconds.
    
    Andreas
    
  10. Re: Some efforts to get rid of "long" in our codebase

    David Rowley <dgrowleyml@gmail.com> — 2025-11-16T23:37:25Z

    On Mon, 10 Nov 2025 at 13:53, David Rowley <dgrowleyml@gmail.com> wrote:
    > v2-0001 wraps the format string as suggested by Heikki, v3-0001 uses
    > unsigned long long as suggested by Peter.
    >
    > v2-0002 is updated to use size_t instead of Size, per Heikki
    >
    > Any further opinions or votes on v2-0001 vs v3-0001?
    
    Nobody seems to feel strongly either way, so I looked again and
    thought that using uint64 is nicer as the size of the type is
    explicit. We do want a 64-bit type here, not something bigger, which
    in theory, long long could be. The less readable format, IMO seemed
    like an ok trade-off to be explicit about the type's size.
    
    With that, I pushed the CATCACHE_STATS patch. I also pushed the
    MemSet/MemSetAligned one too.
    
    Thanks for looking.
    
    David