Re: logical decoding and replication of sequences, take 2
Tomas Vondra <tomas.vondra@enterprisedb.com>
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
To: Amit Kapila <amit.kapila16@gmail.com>
Cc: "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>,
PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
Masahiko Sawada <sawada.mshk@gmail.com>,
Peter Eisentraut <peter.eisentraut@enterprisedb.com>,
Dilip Kumar <dilipbalaut@gmail.com>
Date: 2023-11-27T13:41:40Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
-
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
-
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
-
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
-
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited
On 11/27/23 12:11, Amit Kapila wrote: > On Mon, Nov 27, 2023 at 4:17 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> >> On 11/27/23 11:13, Amit Kapila wrote: >>> On Mon, Nov 27, 2023 at 11:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> >>>> On Mon, Nov 27, 2023 at 6:41 AM Tomas Vondra >>>> <tomas.vondra@enterprisedb.com> wrote: >>>>> >>>>> While going over 0001, I realized there might be an optimization for >>>>> ReorderBufferSequenceIsTransactional. As coded in 0001, it always >>>>> searches through all top-level transactions, and if there's many of them >>>>> that might be expensive, even if very few of them have any relfilenodes >>>>> in the hash table. It's still linear search, and it needs to happen for >>>>> each sequence change. >>>>> >>>>> But can the relfilenode even be in some other top-level transaction? How >>>>> could it be - our transaction would not see it, and wouldn't be able to >>>>> generate the sequence change. So we should be able to simply check *our* >>>>> transaction (or if it's a subxact, the top-level transaction). Either >>>>> it's there (and it's transactional change), or not (and then it's >>>>> non-transactional change). >>>>> >>>> >>>> I also think the relfilenode should be part of either the current >>>> top-level xact or one of its subxact, so looking at all the top-level >>>> transactions for each change doesn't seem advisable. >>>> >>>>> The 0004 does this. >>>>> >>>>> This of course hinges on when exactly the transactions get created, and >>>>> assignments processed. For example if this would fire before the txn >>>>> gets assigned to the top-level one, this would break. I don't think this >>>>> can happen thanks to the immediate logging of assignments, but I'm too >>>>> tired to think about it now. >>>>> >>>> >>>> This needs some thought because I think we can't guarantee the >>>> association till we reach the point where we can actually decode the >>>> xact. See comments in AssertTXNLsnOrder() [1]. >>>> >> >> I suppose you mean the comment before the SnapBuildXactNeedsSkip call, >> which says: >> >> /* >> * Skip the verification if we don't reach the LSN at which we start >> * decoding the contents of transactions yet because until we reach >> * the LSN, we could have transactions that don't have the association >> * between the top-level transaction and subtransaction yet and >> * consequently have the same LSN. We don't guarantee this >> * association until we try to decode the actual contents of >> * transaction. The ordering of the records prior to the >> * start_decoding_at LSN should have been checked before the restart. >> */ >> >> But doesn't this say that after we actually start decoding / stop >> skipping, we should have seen the assignment? We're already decoding >> transaction contents (because sequence change *is* part of xact, even if >> we decide to replay it in the non-transactional way). >> > > It means to say that the assignment is decided after start_decoding_at > point. We haven't decided that we are past start_decoding_at by the > time the patch is computing the transactional flag. > Ah, I see. We're deciding if the change is transactional before calling SnapBuildXactNeedsSkip. That's a bit unfortunate. >>> >>> I am wondering that instead of building the infrastructure to know >>> whether a particular change is transactional on the decoding side, >>> can't we have some flag in the WAL record to note whether the change >>> is transactional or not? I have discussed this point with my colleague >>> Kuroda-San and we thought that it may be worth exploring whether we >>> can use rd_createSubid/rd_newRelfilelocatorSubid in RelationData to >>> determine if the sequence is created/changed in the current >>> subtransaction and then record that in WAL record. By this, we need to >>> have additional information in the WAL record like XLOG_SEQ_LOG but we >>> can probably do it only with wal_level as logical. >>> >> >> I may not understand the proposal exactly, but it's not enough to know >> if it was created in the same subxact. It might have been created in >> some earlier subxact in the same top-level xact. >> > > We should be able to detect even some earlier subxact or top-level > xact based on rd_createSubid/rd_newRelfilelocatorSubid. > Interesting. I admit I haven't considered using these fields before, so I need to familiarize with it a bit, and try if it'd work. >> FWIW I think one of the earlier patch versions did something like this, >> by adding a "created" flag in the xlog record. And we concluded doing >> this on the decoding side is a better solution. >> > > oh, I thought it would be much simpler than what we are doing on the > decoding-side. Can you please point me to the email discussion where > this is concluded or share the reason? > I think the discussion started around [1], and then in a bunch of following messages (search for "relfilenode"). regards [1] https://www.postgresql.org/message-id/CAExHW5v_vVqkhF4ehST9EzpX1L3bemD1S%2BkTk_-ZVu_ir-nKDw%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company