Re: logical decoding and replication of sequences, take 2

Amit Kapila <amit.kapila16@gmail.com>

From: Amit Kapila <amit.kapila16@gmail.com>

To: Tomas Vondra <tomas.vondra@enterprisedb.com>

Cc: "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>, "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Masahiko Sawada <sawada.mshk@gmail.com>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>, Dilip Kumar <dilipbalaut@gmail.com>

Date: 2023-11-29T13:42:18Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited

On Wed, Nov 29, 2023 at 2:59 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> I have been hacking on improving the improvements outlined in my
> preceding e-mail, but I have some bad news - I ran into an issue that I
> don't know how to solve :-(
>
> Consider this transaction:
>
>   BEGIN;
>   ALTER SEQUENCE s RESTART 1000;
>
>   SAVEPOINT s1;
>   ALTER SEQUENCE s RESTART 2000;
>   ROLLBACK TO s1;
>
>   INSERT INTO seq_test SELECT nextval('s') FROM generate_series(1,40);
>   COMMIT;
>
> If you try this with the approach relying on rd_newRelfilelocatorSubid
> and rd_createSubid, it fails like this on the subscriber:
>
>   ERROR:  could not map filenode "base/5/16394" to relation OID
>
> This happens because ReorderBufferQueueSequence tries to do this in the
> non-transactional branch:
>
>   reloid = RelidByRelfilenumber(rlocator.spcOid, rlocator.relNumber);
>
> and the relfilenode is the one created by the first ALTER. But this is
> obviously wrong - the changes should have been treated as transactional,
> because they are tied to the first ALTER. So how did we get there?
>
> Well, the whole problem is that in case of abort, AtEOSubXact_cleanup
> resets the two fields to InvalidSubTransactionId. Which means the
> rollback in the above transaction also forgets about the first ALTER.
> Now that I look at the RelationData comments, it actually describes
> exactly this situation:
>
>   *
>   * rd_newRelfilelocatorSubid is the ID of the highest subtransaction
>   * the most-recent relfilenumber change has survived into or zero if
>   * not changed in the current transaction (or we have forgotten
>   * changing it).  This field is accurate when non-zero, but it can be
>   * zero when a relation has multiple new relfilenumbers within a
>   * single transaction, with one of them occurring in a subsequently
>   * aborted subtransaction, e.g.
>   *    BEGIN;
>   *    TRUNCATE t;
>   *    SAVEPOINT save;
>   *    TRUNCATE t;
>   *    ROLLBACK TO save;
>   *    -- rd_newRelfilelocatorSubid is now forgotten
>   *
>
> The root of this problem is that we'd need some sort of "history" for
> the field, so that when a subxact aborts, we can restore the previous
> value. But we obviously don't have that, and I doubt we want to add that
> to relcache - for example, it'd either need to impose some limit on the
> history (and thus a failure when we reach the limit), or it'd need to
> handle histories of arbitrary length.
>

Yeah, I think that would be really tricky and we may not want to go there.

> At this point I don't see a solution for this, which means the best way
> forward with the sequence decoding patch seems to be the original
> approach, on the decoding side.
>

One thing that worries me about that approach is that it can suck with
the workload that has a lot of DDLs that create XLOG_SMGR_CREATE
records. We have previously fixed some such workloads in logical
decoding where decoding a transaction containing truncation of a table
with a lot of partitions (1000 or more) used to take a very long time.
Don't we face performance issues in such scenarios?

How do we see this work w.r.t to some sort of global sequences? There
is some recent discussion where I have raised a similar point [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1JF%3D4_Eoq7FFjHSe98-_ooJ5QWd0s2_pj8gR%2B_dvwKxvA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.