Re: logical decoding and replication of sequences, take 2

Tomas Vondra <tomas.vondra@enterprisedb.com>

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

To: Amit Kapila <amit.kapila16@gmail.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>

Cc: Dilip Kumar <dilipbalaut@gmail.com>, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>, "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Masahiko Sawada <sawada.mshk@gmail.com>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>

Date: 2023-12-21T14:04:55Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited

On 12/15/23 03:33, Amit Kapila wrote:
> On Thu, Dec 14, 2023 at 9:14 PM Ashutosh Bapat
> <ashutosh.bapat.oss@gmail.com> wrote:
>>
>> On Thu, Dec 14, 2023 at 2:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> It can only be cleaned if we process it but xact_decode won't allow us
>>> to process it and I don't think it would be a good idea to add another
>>> hack for sequences here. See below code:
>>>
>>> xact_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
>>> {
>>> SnapBuild  *builder = ctx->snapshot_builder;
>>> ReorderBuffer *reorder = ctx->reorder;
>>> XLogReaderState *r = buf->record;
>>> uint8 info = XLogRecGetInfo(r) & XLOG_XACT_OPMASK;
>>>
>>> /*
>>> * If the snapshot isn't yet fully built, we cannot decode anything, so
>>> * bail out.
>>> */
>>> if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
>>> return;
>>
>> That may be true for a transaction which is decoded, but I think all
>> the transactions which are added to ReorderBuffer should be cleaned up
>> once they have been processed irrespective of whether they are
>> decoded/sent downstream or not. In this case I see the sequence hash
>> being cleaned up for the sequence related transaction in Hayato's
>> reproducer.
>>
> 
> It was because the test you are using was not designed to show the
> problem I mentioned. In this case, the rollback was after a full
> snapshot state was reached.
> 

Right, I haven't tried to reproduce this, but it very much looks like we
the entry would not be removed if the xact aborts/commits before the
snapshot reaches FULL state.

I suppose one way to deal with this would be to first check if an entry
for the same relfilenode exists. If it does, the original transaction
must have terminated, but we haven't cleaned it up yet - in which case
we can just "move" the relfilenode to the new one.

However, can't that happen even with full snapshots? I mean, let's say a
transaction creates a relfilenode and terminates without writing an
abort record (surely that's possible, right?). And then another xact
comes and generates the same relfilenode (presumably that's unlikely,
but perhaps possible?). Aren't we in pretty much the same situation,
until the next RUNNING_XACTS cleans up the hash table?

I think tracking all relfilenodes would fix the original issue (with
treating some changes as transactional), and the tweak that "moves" the
relfilenode to the new xact would fix this other issue too.

That being said, I feel a bit uneasy about it, for similar reasons as
Amit. If we start processing records before full snapshot, that seems
like moving the assumptions a bit. For example it means we'd create
ReorderBufferTXN entries for cases that'd have skipped before. OTOH this
is (or should be) only a very temporary period while starting the
replication, I believe.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company