Re: logical decoding and replication of sequences, take 2
Tomas Vondra <tomas.vondra@enterprisedb.com>
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
To: "Jonathan S. Katz" <jkatz@postgresql.org>, vignesh C <vignesh21@gmail.com>
Cc: Andres Freund <andres@anarazel.de>, Robert Haas <robertmhaas@gmail.com>,
PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: 2023-02-23T12:56:13Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
-
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
-
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
-
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
-
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited
On 2/22/23 18:04, Jonathan S. Katz wrote: > On 2/22/23 5:02 AM, Tomas Vondra wrote: >> >> On 2/22/23 03:28, Jonathan S. Katz wrote: > >>> Thanks for continuing to work on this patch! I tested the latest version >>> and have some feedback/clarifications. >>> >> >> Thanks! > > Also I should mention I've been testing with both async/sync logical > replication. I didn't have any specific comments on either as it seemed > to just work and behaviors aligned with existing expectations. > > Generally it's been a good experience and it seems to be working. :) At > this point I'm trying to understand the limitations and tripwires so we > can guide users appropriately. > Good to hear. >> Yes, this is due to how we WAL-log sequences. We don't log individual >> increments, but every 32nd increment and we log the "future" sequence >> state so that after a crash/recovery we don't generate duplicates. >> >> So you do nextval() and it returns 1. But into WAL we record 32. And >> there will be no WAL records until nextval reaches 32 and needs to >> generate another batch. >> >> And because logical replication relies on these WAL records, it inherits >> this batching behavior with a "jump" on recovery/failover. IMHO it's OK, >> it works for the "logical failover" use case and if you need gapless >> sequences then regular sequences are not an issue anyway. >> >> It's possible to reduce the jump a bit by reducing the batch size (from >> 32 to 0) so that every increment is logged. But it doesn't eliminate it >> because of rollbacks. > > I generally agree. I think it's mainly something we should capture in > the user docs that they can be a jump on the subscriber side, so people > are not surprised. > > Interestingly, in systems that tend to have higher rates of failover > (I'm thinking of a few distributed systems), this may cause int4 > sequences to exhaust numbers slightly (marginally?) more quickly. Likely > not too big of an issue, but something to keep in mind. > IMHO the number of systems that would work fine with int4 sequences but this change results in the sequences being "exhausted" too quickly is indistinguishable from 0. I don't think this is an issue. >>> 2. Using with origin=none with nonconflicting sequences. >>> >>> I modified the example in [1] to set up two schemas with non-conflicting >>> sequences[2], e.g. on instance 1: >>> >>> CREATE TABLE public.room ( >>> id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 1) >>> PRIMARY KEY, >>> name text NOT NULL >>> ); >>> >>> and instance 2: >>> >>> CREATE TABLE public.room ( >>> id int GENERATED BY DEFAULT AS IDENTITY (INCREMENT 2 START WITH 2) >>> PRIMARY KEY, >>> name text NOT NULL >>> ); >>> >> >> Well, yeah. We don't support active-active logical replication (at least >> not with the built-in). You can easily get into similar issues without >> sequences. > > The "origin=none" feature lets you replicate tables bidirectionally. > While it's not full "active-active", this is a starting point and a > feature for v16. We'll definitely have users replicating data > bidirectionally with this. > Well, then the users need to use some other way to generate IDs, not local sequences. Either some sort of distributed/global sequence, UUIDs or something like that. >> Replicating a sequence overwrites the state of the sequence on the other >> side, which may result in it generating duplicate values with the other >> node, etc. > > I understand that we don't currently support global sequences, but I am > concerned there may be a tripwire here in the origin=none case given > it's fairly common to use serial/GENERATED BY to set primary keys. And > it's fairly trivial to set them to be nonconflicting, or at least give > the user the appearance that they are nonconflicting. > > From my high level understand of how sequences work, this sounds like it > would be a lift to support the example in [1]. Or maybe the answer is > that you can bidirectionally replicate the changes in the tables, but > not sequences? > Yes, I don't think local sequences don't and can't work in such setups. > In any case, we should update the restrictions in [2] to state: while > sequences can be replicated, there is additional work required if you > are bidirectionally replicating tables that use sequences, esp. if used > in a PK or a constraint. We can provide alternatives to how a user could > set that up, i.e. not replicates the sequences or do something like in [3]. > I agree. I see this as mostly a documentation issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company