Re: logical decoding and replication of sequences, take 2

Tomas Vondra <tomas.vondra@enterprisedb.com>

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Date: 2022-08-19T11:11:35Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited

I've been thinking about the two optimizations mentioned at the end a
bit more, so let me share my thoughts before I forget that:

On 8/18/22 23:10, Tomas Vondra wrote:
>
> ...
>
> And maybe we could then use the LSN to read the increment from the WAL
> during decoding, instead of having to read it and WAL-log it during
> commit. Essentially, we'd run a local XLogReader. Of course, we'd have
> to be careful about checkpoints, not sure what to do about that.
> 

I think logging just the LSN is workable.

I was worried about dealing with checkpoints, because imagine you do
nextval() on sequence that was last WAL-logged a couple checkpoints
back. Then you wouldn't be able to read the LSN (when decoding), because
the WAL might have been recycled. But that can't happen, because we
always force WAL-logging the first time nextval() is called after a
checkpoint. So we know the LSN is guaranteed to be available.

Of course, this would not reduce the amount of WAL messages, because
we'd still log all sequences touched by the transaction. We wouldn't
need to read the state from disk, though, and we could ignore "old"
stuff in decoding (with LSN lower than the last LSN we decoded).

For frequently used sequences that seems like a win.

> Another idea that just occurred to me is that if we end up having to
> read the sequence state during commit, maybe we could at least optimize
> it somehow. For example we might track LSN of the last logged state for
> each sequence (in shared memory or something), and the other sessions
> could just skip the WAL-log if their "local" LSN is <= than this LSN.
> 

Tracking the last LSN for each sequence (in a SLRU or something) should
work too, I guess. In principle this just moves the skipping of "old"
increments from decoding to writing, so that we don't even have to write
those into WAL.

We don't even need persistence, nor to keep all the records, I think. If
you don't find a record for a given sequence, assume it wasn't logged
yet and just log it. Of course, it requires a bit of shared memory for
each sequence, say ~32B. Not sure about the overhead, but I'd bet if you
have many (~thousands) frequently used sequences, there'll be a lot of
other overhead making this irrelevant.

Of course, if we're doing the skipping when writing the WAL, maybe we
should just read the sequence state - we'd do the I/O, but only in
fraction of the transactions, and we wouldn't need to read old WAL in
logical decoding.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company