Re: logical decoding and replication of sequences, take 2
Amit Kapila <amit.kapila16@gmail.com>
From: Amit Kapila <amit.kapila16@gmail.com>
To: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Cc: Dilip Kumar <dilipbalaut@gmail.com>,
Tomas Vondra <tomas.vondra@enterprisedb.com>, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>,
"Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Masahiko Sawada <sawada.mshk@gmail.com>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Date: 2023-12-14T09:06:31Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
-
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
-
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
-
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
-
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited
On Thu, Dec 14, 2023 at 12:31 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Thu, Dec 14, 2023 at 10:53 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > > > > It is correct that we can make a wrong decision about whether a change > > > is transactional or non-transactional when sequence DDL happens before > > > the SNAPBUILD_FULL_SNAPSHOT state and the sequence operation happens > > > after that state. However, one thing to note here is that we won't try > > > to stream such a change because for non-transactional cases we don't > > > proceed unless the snapshot is in a consistent state. Now, if the > > > decision had been correct then we would probably have queued the > > > sequence change and discarded at commit. > > > > > > One thing that we deviate here is that for non-sequence transactional > > > cases (including logical messages), we immediately start queuing the > > > changes as soon as we reach SNAPBUILD_FULL_SNAPSHOT state (provided > > > SnapBuildProcessChange() returns true which is quite possible) and > > > take final decision at commit/prepare/abort time. However, that won't > > > be the case for sequences because of the dependency of determining > > > transactional cases on one of the prior records. Now, I am not > > > completely sure at this stage if such a deviation can cause any > > > problem and or whether we are okay to have such a deviation for > > > sequences. > > > > Okay, so this particular scenario that I raised is somehow saved, I > > mean although we are considering transactional sequence operation as > > non-transactional we also know that if some of the changes for a > > transaction are skipped because the snapshot was not FULL that means > > that transaction can not be streamed because that transaction has to > > be committed before snapshot become CONSISTENT (based on the snapshot > > state change machinery). Ideally based on the same logic that the > > snapshot is not consistent the non-transactional sequence changes are > > also skipped. But the only thing that makes me a bit uncomfortable is > > that even though the result is not wrong we have made some wrong > > intermediate decisions i.e. considered transactional change as > > non-transactions. > > > > One solution to this issue is that, even if the snapshot state does > > not reach FULL just add the sequence relids to the hash, I mean that > > hash is only maintained for deciding whether the sequence is changed > > in that transaction or not. So no adding such relids to hash seems > > like a root cause of the issue. Honestly, I haven't analyzed this > > idea in detail about how easy it would be to add only these changes to > > the hash and what are the other dependencies, but this seems like a > > worthwhile direction IMHO. > > ... > It looks like the solution works. But this is the only place where we > process a change before SNAPSHOT reaches FULL. But this is also the > only record which affects a decision to queue/not a following change. > So it should be ok. The sequence_hash'es as separate for each > transaction and they are cleaned when processing COMMIT record. > > It looks like the solution works. But this is the only place where we process a change before SNAPSHOT reaches FULL. But this is also the only record which affects a decision to queue/not a following change. So it should be ok. The sequence_hash'es as separate for each transaction and they are cleaned when processing COMMIT record. > But it is possible that even commit or abort also happens before the snapshot reaches full state in which case the hash table will have stale or invalid (for aborts) entries. That will probably be cleaned at a later point by running_xact records. Now, I think in theory, it is possible that the same RelFileLocator can again be allocated before we clean up the existing entry which can probably confuse the system. It might or might not be a problem in practice but I think the more assumptions we add for sequences, the more difficult it will become to ensure its correctness. -- With Regards, Amit Kapila.