Re: logical decoding and replication of sequences, take 2

Tomas Vondra <tomas.vondra@enterprisedb.com>

From: Tomas Vondra <tomas.vondra@enterprisedb.com>
To: Dilip Kumar <dilipbalaut@gmail.com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>, Amit Kapila <amit.kapila16@gmail.com>, "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Masahiko Sawada <sawada.mshk@gmail.com>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Date: 2023-12-06T13:39:42Z
Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Migrate logical slots to the new node during an upgrade.

  2. Make test_decoding ddl.out shorter

  3. Fix snapshot handling in logicalmsg_decode

  4. doc: Adjust a few more references to "postmaster"

  5. Revert "Logical decoding of sequences"

On 12/6/23 10:05, Dilip Kumar wrote:
> On Wed, Dec 6, 2023 at 11:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>
>> On Sun, Dec 3, 2023 at 11:22 PM Tomas Vondra
>> <tomas.vondra@enterprisedb.com> wrote:
>>>
> 
> I was also wondering what happens if the sequence changes are
> transactional but somehow the snap builder state changes to
> SNAPBUILD_FULL_SNAPSHOT in between processing of the smgr_decode() and
> the seq_decode() which means RelFileLocator will not be added to the
> hash table and during the seq_decode() we will consider the change as
> non-transactional.  I haven't fully analyzed that what is the real
> problem in this case but have we considered this case? what happens if
> the transaction having both ALTER SEQUENCE and nextval() gets aborted
> but the nextva() has been considered as non-transactional because
> smgr_decode() changes were not processed because snap builder state
> was not yet SNAPBUILD_FULL_SNAPSHOT.
> 

Yes, if something like this happens, that'd be a problem:

1) decoding starts, with

   SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT

2) transaction that creates a new refilenode gets decoded, but we skip
   it because we don't have the correct snapshot

3) snapshot changes to SNAPBUILD_FULL_SNAPSHOT

4) we decode sequence change from nextval() for the sequence

This would lead to us attempting to apply sequence change for a
relfilenode that's not visible yet (and may even get aborted).

But can this even happen? Can we start decoding in the middle of a
transaction? How come this wouldn't affect e.g. XLOG_HEAP2_NEW_CID,
which is also skipped until SNAPBUILD_FULL_SNAPSHOT. Or logical
messages, where we also call the output plugin in non-transactional cases.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company