Re: logical decoding and replication of sequences, take 2
Robert Haas <robertmhaas@gmail.com>
From: Robert Haas <robertmhaas@gmail.com>
To: Tomas Vondra <tomas.vondra@enterprisedb.com>
Cc: Amit Kapila <amit.kapila16@gmail.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, Dilip Kumar <dilipbalaut@gmail.com>, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>,
"Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Masahiko Sawada <sawada.mshk@gmail.com>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Date: 2024-01-26T14:39:16Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
-
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
-
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
-
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
-
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited
On Wed, Jan 24, 2024 at 12:46 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> I did try to explain how this works (and why) in a couple places:
>
> 1) the commit message
> 2) reorderbuffer header comment
> 3) ReorderBufferSequenceIsTransactional comment (and nearby)
>
> It's possible this does not meet your expectations, ofc. Maybe there
> should be a separate README for this - I haven't found anything like
> that for logical decoding in general, which is why I did (1)-(3).
I read over these and I do think they answer a bunch of questions, but
I don't think they answer all of the questions.
Suppose T1 creates a sequence and commits. Then T2 calls nextval().
Then T3 drops the sequence. According to the commit message, T2's
change will be "replayed immediately after decoding". But it's
essential to replay T2's change after we replay T1 and before we
replay T3, and the comments don't explain why that's guaranteed.
The answer might be "locks". If we always replay a transaction
immediately when we see it's commit record then in the example above
we're fine, because the commit record for the transaction that creates
the sequence must precede the nextval() call, since the sequence won't
be visible until the transaction commits, and also because T1 holds a
lock on it at that point sufficient to hedge out nextval. And the
nextval record must precede the point where T3 takes an exclusive lock
on the sequence.
Note, however, that this change of reasoning critically depends on us
never delaying application of a transaction. If we might reach T1's
commit record and say "hey, let's hold on to this for a bit and replay
it after we've decoded some more," everything immediately breaks,
unless we also delay application of T2's non-transactional update in
such a way that it's still guaranteed to happen after T1. I wonder if
this kind of situation would be a problem for a future parallel-apply
feature. It wouldn't work, for example, to hand T1 and T3 off (in that
order) to a separate apply process but handle T2's "non-transactional"
message directly, because it might handle that message before the
application of T1 got completed.
This also seems to depend on every transactional operation that might
affect a future non-transactional operation holding a lock that would
conflict with that non-transactional operation. For example, if ALTER
SEQUENCE .. RESTART WITH didn't take a strong lock on the sequence,
then you could have: T1 does nextval, T2 does ALTER SEQUENCE RESTART
WITH, T1 does nextval again, T1 commits, T2 commits. It's unclear what
the semantics of that would be -- would T1's second nextval() see the
sequence restart, or what? But if the effect of T1's second nextval
does depend in some way on the ALTER SEQUENCE operation which precedes
it in the WAL stream, then we might have some trouble here, because
both nextvals precede the commit of T2. Fortunately, this sequence of
events is foreclosed by locking.
But I did find one somewhat-similar case in which that's not so.
S1: create table withseq (a bigint generated always as identity);
S1: begin;
S2: select nextval('withseq_a_seq');
S1: alter table withseq set unlogged;
S2: select nextval('withseq_a_seq');
I think this is a bug in the code that supports owned sequences rather
than a problem that this patch should have to do something about. When
a sequence is flipped between logged and unlogged directly, we take a
stronger lock than we do here when it's done in this indirect way.
Also, I'm not quite sure if it would pose a problem for sequence
decoding anyway: it changes the relfilenode, but not the value. But
this is the *kind* of problem that could make the approach unsafe:
supposedly transactional changes being interleaved with supposedly
non-transctional changes, in such a way that the non-transactional
changes might get applied at the wrong time relative to the
transactional changes.
--
Robert Haas
EDB: http://www.enterprisedb.com