Re: Changing the state of data checksums in a running cluster

Tomas Vondra <tomas@vondra.me>

From: Tomas Vondra <tomas@vondra.me>
To: Daniel Gustafsson <daniel@yesql.se>
Cc: Michael Banck <mbanck@gmx.net>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Date: 2024-11-26T22:07:12Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Use correct datatype for PID

  2. Improve comments in online checksums code

  3. Fix checksum state transition during promotion

  4. Fix regex searching for page verification failures in tests

  5. Apply data-checksum worker throttling parameters

  6. Skip WAL for unlogged main fork during online checksum enable

  7. Revert "Get rid of WALBufMappingLock"

  8. Get rid of WALBufMappingLock

  9. Improve grammar of options for command arrays in TAP tests

Hi,

I spent a bit more time doing some testing on the last version of the
patch from [1]. And I ran into this assert in PostmasterStateMachine()
when stopping the cluster:

  /* All types should be included in targetMask or remainMask */
  Assert((remainMask.mask | targetMask.mask) == BTYPE_MASK_ALL.mask);

At first I was puzzled as this happens on every shutdown, but that's
because these checks were introduced by a78af0427015 a week ago. So it's
more a matter of rebasing.

However, I also noticed the progress monitoring does not really work. I
get stuff like this:

    + psql -x test -c 'select * from pg_stat_progress_data_checksums'
    -[ RECORD 1 ]---------------------+---------
    pid                               | 56811
    datid                             | 0
    datname                           |
    phase                             | enabling
    databases_total                   | 4
    relations_total                   |
    databases_processed               | 0
    relations_processed               | 0
    databases_current                 | 16384
    relation_current                  | 0
    relation_current_blocks           | 0
    relation_current_blocks_processed | 0

But I've never seen any of the "processed" fields to be non-zero (and
relations is even NULL), and the same thing applies to relation_. Also
what is the datid/datname about? It's empty, not mentioned in sgml docs,
and we already have databases_current ...

The message [2] from 10/08 says:

> I did remove parts of the progress reporting for now since it can't be
> used from the dynamic backgroundworker it seems.  I need to regroup
> and figure out a better way there, but I wanted to address your above
> find sooner rather than wait for that.

And I guess that would explain why some of the fields are not updated.
But then the later patch versions seem to imply there are no outstanding
issues / missing stuff.


regards


[1]
https://www.postgresql.org/message-id/CA226DE1-DC9A-4675-A83C-32270C473F0B%40yesql.se

[2]
https://www.postgresql.org/message-id/DD25705F-E75F-4DCA-B49A-5578F4F55D94%40yesql.se

-- 
Tomas Vondra