Thread

  1. Re: Improve docs for n_distinct_inherited

    David G. Johnston <david.g.johnston@gmail.com> — 2025-10-13T01:27:48Z

    On Sun, Oct 12, 2025 at 7:42 PM David Rowley <dgrowleyml@gmail.com> wrote:
    
    > Just picking this one up again. I forgot to come back to this after
    > PGConf.dev.
    >
    > On Fri, 9 May 2025 at 02:50, David G. Johnston
    > <david.g.johnston@gmail.com> wrote:
    > > I was missing this key piece of knowledge which invalidated my entire
    > attempt.
    > >
    > > Here's an attempt at shortening this now that I understand the mechanics
    > better.
    > >
    > >       Separate options exist because an inheritance parent table has two
    > >       different sets of statistics: one considering only itself and one
    > which
    > >       also includes its children
    > (<literal>n_distinct_inherited</literal>).
    > >       Partitioned tables, which only have rows in the children, likewise
    > uses
    > >       the inherited option while everyone else uses
    > <literal>n_distinct</literal>.
    >
    > I wasn't quite happy with that as the text indicates that
    > n_distinct_inherited is the statistics.  But, it's not, it's just the
    > option that allows some modification of the gathered statistics.
    >
    > I came up with:
    >
    >       Ordinarily <literal>n_distinct</literal> is used.
    >       <literal>n_distinct_inherited</literal> exists to allow the distinct
    >       estimate to be overwritten for the statistics gathered for
    > inheritance
    >       parent tables and for partitioned tables.
    
    
    How about:
    
    "n_disinct is used for normal tables while n_distinct_inherited is used for
    partitioned tables.  Both are usable (selected via the ONLY modifier) for
    an inheritance parent table since it has both storage and children."
    
    The use of both "Ordinarily" and "overwritten" is bothering me here.  And
    it implies that n_distinct doesn't work for inheritance parent tables or,
    conversely, that n_distinct does work for partitioned tables.
    
    
    > I also fixed what I thought was some misleading text about ANALYZE
    > using this value to calculate things.
    
    
    > values in the column is linear with the estimated number
    >      of rows in the table; the exact count is to be computed by
    multiplying the estimated
    >      rows in the table by the absolute value of the given number.
    
    "...is proportional to the estimated number of rows in the table at
    planning time."
    
    (The final "at planning time" substitutes for the sentence you pondered
    removing.)
    
    (The rest, including the examples, seem a bit self-explanatory given the
    definition, though I do get reader inexperience with the terminology.  But
    proportional implies a linear relationship, and the positive/negative
    bifurcation seems straight-forward here.)
    
    I'm thinking everything else below is better incorporated into 14.2 which
    should be linking back to this section.  That way the crux of the usage is
    defined in syntax while the details about setting a specific value are
    located in the section covering the overall topic.
    
     the exact count is to be computed by multiplying the estimated
    +      rows in the table by the absolute value of the given number.  For
    example,
           a value of -1 implies that all values in the column are distinct,
    while
    -      a value of -0.5 implies that each value appears twice on the average.
    +      a value of -0.5 implies that each value appears twice on average.
           This can be useful when the size of the table changes over time,
    since
           the multiplication by the number of rows in the table is not
    performed
           until query planning time.
    
    (Leave: Specify a value of 0 to revert to estimating...)
    
    That said, this rework would be OK as-is.
    
    Also, looking at stadistinct, the multiplier stored there accounts for the
    presence of null.  This attribute-option does not.  Is that difference
    worth noting?
    
    
    David J.