logical decoding and replication of sequences, take 2

Tomas Vondra <tomas.vondra@enterprisedb.com>

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

To: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>

Date: 2022-08-18T21:10:39Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited

Attachments

decoding-sequences-tracking-20220818.patch (text/x-patch) patch

Hi,

Here's a rebased version of the patch adding logical decoding of
sequences. The previous attempt [1] ended up getting reverted, due to
running into issues with non-transactional nature of sequences when
decoding the existing WAL records. See [2] for details.

This patch uses a different approach, proposed by Hannu Krosing [3],
based on tracking sequences actually modified in each transaction, and
then WAL-logging the state at the end.

This does work, but I'm not very happy about WAL-logging all sequences
at the end. The "problem" is we have to re-read the current state of the
sequence from disk, because it might be concurrently updated by another
transaction.

Imagine two transactions, T1 and T2:

T1: BEGIN

T1: SELECT nextval('s') FROM generate_series(1,1000)

T2: BEGIN

T2: SELECT nextval('s') FROM generate_series(1,1000)

T2: COMMIT

T1: COMMIT

The expected outcome is that the sequence value is ~2000. We must not
blindly apply the changes from T2 by the increments in T1. So the patch
simply reads "current" state of the transaction at commit time. Which is
annoying, because it involves I/O, increases the commit duration, etc.

On the other hand, this is likely cheaper than the other approach based
on WAL-logging every sequence increment (that would have to be careful
about obsoleted increments too, when applying them transactionally).


I wonder if we might deal with this by simply WAL-logging LSN of the
last change for each sequence (in the given xact), which would allow
discarding the "obsolete" changes quite easily I think. nextval() would
simply look at LSN in the page header.

And maybe we could then use the LSN to read the increment from the WAL
during decoding, instead of having to read it and WAL-log it during
commit. Essentially, we'd run a local XLogReader. Of course, we'd have
to be careful about checkpoints, not sure what to do about that.

Another idea that just occurred to me is that if we end up having to
read the sequence state during commit, maybe we could at least optimize
it somehow. For example we might track LSN of the last logged state for
each sequence (in shared memory or something), and the other sessions
could just skip the WAL-log if their "local" LSN is <= than this LSN.


regards


[1]
https://www.postgresql.org/message-id/flat/d045f3c2-6cfb-06d3-5540-e63c320df8bc@enterprisedb.com

[2]
https://www.postgresql.org/message-id/00708727-d856-1886-48e3-811296c7ba8c%40enterprisedb.com

[3]
https://www.postgresql.org/message-id/CAMT0RQQeDR51xs8zTa25YpfKB1B34nS-Q4hhsRPznVsjMB_P1w%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company