Re: logical decoding and replication of sequences, take 2

Tomas Vondra <tomas.vondra@enterprisedb.com>

From: Tomas Vondra <tomas.vondra@enterprisedb.com>

To: "Jonathan S. Katz" <jkatz@postgresql.org>, vignesh C <vignesh21@gmail.com>

Cc: Andres Freund <andres@anarazel.de>, Robert Haas <robertmhaas@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Heikki Linnakangas <heikki.linnakangas@iki.fi>

Date: 2023-02-23T12:56:13Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited

On 2/22/23 18:04, Jonathan S. Katz wrote:
> On 2/22/23 5:02 AM, Tomas Vondra wrote:
>>
>> On 2/22/23 03:28, Jonathan S. Katz wrote:
> 
>>> Thanks for continuing to work on this patch! I tested the latest version
>>> and have some feedback/clarifications.
>>>
>>
>> Thanks!
> 
> Also I should mention I've been testing with both async/sync logical
> replication. I didn't have any specific comments on either as it seemed
> to just work and behaviors aligned with existing expectations.
> 
> Generally it's been a good experience and it seems to be working. :) At
> this point I'm trying to understand the limitations and tripwires so we
> can guide users appropriately.
> 

Good to hear.

>> Yes, this is due to how we WAL-log sequences. We don't log individual
>> increments, but every 32nd increment and we log the "future" sequence
>> state so that after a crash/recovery we don't generate duplicates.
>>
>> So you do nextval() and it returns 1. But into WAL we record 32. And
>> there will be no WAL records until nextval reaches 32 and needs to
>> generate another batch.
>>
>> And because logical replication relies on these WAL records, it inherits
>> this batching behavior with a "jump" on recovery/failover. IMHO it's OK,
>> it works for the "logical failover" use case and if you need gapless
>> sequences then regular sequences are not an issue anyway.
>>
>> It's possible to reduce the jump a bit by reducing the batch size (from
>> 32 to 0) so that every increment is logged. But it doesn't eliminate it
>> because of rollbacks.
> 
> I generally agree. I think it's mainly something we should capture in
> the user docs that they can be a jump on the subscriber side, so people
> are not surprised.
> 
> Interestingly, in systems that tend to have higher rates of failover
> (I'm thinking of a few distributed systems), this may cause int4
> sequences to exhaust numbers slightly (marginally?) more quickly. Likely
> not too big of an issue, but something to keep in mind.
> 

IMHO the number of systems that would work fine with int4 sequences but
this change results in the sequences being "exhausted" too quickly is
indistinguishable from 0. I don't think this is an issue.

>>> 2. Using with origin=none with nonconflicting sequences.
>>>
>>> I modified the example in [1] to set up two schemas with non-conflicting
>>> sequences[2], e.g. on instance 1:
>>>
>>> CREATE TABLE public.room (
>>>      id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 1)
>>> PRIMARY KEY,
>>>      name text NOT NULL
>>> );
>>>
>>> and instance 2:
>>>
>>> CREATE TABLE public.room (
>>>      id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 2)
>>> PRIMARY KEY,
>>>      name text NOT NULL
>>> );
>>>
>>
>> Well, yeah. We don't support active-active logical replication (at least
>> not with the built-in). You can easily get into similar issues without
>> sequences.
> 
> The "origin=none" feature lets you replicate tables bidirectionally.
> While it's not full "active-active", this is a starting point and a
> feature for v16. We'll definitely have users replicating data
> bidirectionally with this.
> 

Well, then the users need to use some other way to generate IDs, not
local sequences. Either some sort of distributed/global sequence, UUIDs
or something like that.

>> Replicating a sequence overwrites the state of the sequence on the other
>> side, which may result in it generating duplicate values with the other
>> node, etc.
> 
> I understand that we don't currently support global sequences, but I am
> concerned there may be a tripwire here in the origin=none case given
> it's fairly common to use serial/GENERATED BY to set primary keys. And
> it's fairly trivial to set them to be nonconflicting, or at least give
> the user the appearance that they are nonconflicting.
> 
> From my high level understand of how sequences work, this sounds like it
> would be a lift to support the example in [1]. Or maybe the answer is
> that you can bidirectionally replicate the changes in the tables, but
> not sequences?
> 

Yes, I don't think local sequences don't and can't work in such setups.

> In any case, we should update the restrictions in [2] to state: while
> sequences can be replicated, there is additional work required if you
> are bidirectionally replicating tables that use sequences, esp. if used
> in a PK or a constraint. We can provide alternatives to how a user could
> set that up, i.e. not replicates the sequences or do something like in [3].
> 

I agree. I see this as mostly a documentation issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company