Re: logical decoding and replication of sequences, take 2
Amit Kapila <amit.kapila16@gmail.com>
From: Amit Kapila <amit.kapila16@gmail.com>
To: Tomas Vondra <tomas.vondra@enterprisedb.com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>, "Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>,
Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>, PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>, Masahiko Sawada <sawada.mshk@gmail.com>, Peter Eisentraut <peter.eisentraut@enterprisedb.com>,
Dilip Kumar <dilipbalaut@gmail.com>
Date: 2023-12-05T12:17:57Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Migrate logical slots to the new node during an upgrade.
- 29d0a77fa660 17.0 cited
-
Make test_decoding ddl.out shorter
- d6677b93c79b 17.0 landed
- c5c5832600e9 14.9 landed
- b1dc946eee3d 16.0 landed
- 3bb8b9342f8a 15.4 landed
-
Fix snapshot handling in logicalmsg_decode
- 949ac32e1267 15.3 landed
- 8b9cbd42b61f 14.8 landed
- 4df581fa0f4b 13.11 landed
- 497f863f0598 12.15 landed
- 8de91ebf2ac1 11.20 landed
- 7fe1aa991b62 16.0 landed
-
doc: Adjust a few more references to "postmaster"
- 17e72ec45d31 16.0 cited
-
Revert "Logical decoding of sequences"
- 2c7ea57e56ca 15.0 cited
On Sun, Dec 3, 2023 at 11:22 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Thanks for the script. Are you also measuring the time it takes to > decode this using test_decoding? > > FWIW I did more comprehensive suite of tests over the weekend, with a > couple more variations. I'm attaching the updated scripts, running it > should be as simple as > > ./run.sh BRANCH TRANSACTIONS RUNS > > so perhaps > > ./run.sh master 1000 3 > > to do 3 runs with 1000 transactions per client. And it'll run a bunch of > combinations hard-coded in the script, and write the timings into a CSV > file (with "master" in each row). > > I did this on two machines (i5 with 4 cores, xeon with 16/32 cores). I > did this with current master, the basic patch (without the 0002 part), > and then with the optimized approach (single global hash table, see the > 0004 part). That's what master / patched / optimized in the results is. > > Interestingly enough, the i5 handled this much faster, it seems to be > better in single-core tasks. The xeon is still running, so the results > for "optimized" only have one run (out of 3), but shouldn't change much. > > Attached is also a table summarizing this, and visualizing the timing > change (vs. master) in the last couple columns. Green is "faster" than > master (but we don't really expect that), and "red" means slower than > master (the more red, the slower). > > There results are grouped by script (see the attached .tgz), with either > 32 or 96 clients (which does affect the timing, but not between master > and patch). Some executions have no pg_sleep() calls, some have 0.001 > wait (but that doesn't seem to make much difference). > > Overall, I'd group the results into about three groups: > > 1) good cases [nextval, nextval-40, nextval-abort] > > These are cases that slow down a bit, but the slowdown is mostly within > reasonable bounds (we're making the decoding to do more stuff, so it'd > be a bit silly to require that extra work to make no impact). And I do > think this is reasonable, because this is pretty much an extreme / worst > case behavior. People don't really do just nextval() calls, without > doing anything else. Not to mention doing aborts for 100% transactions. > > So in practice this is going to be within noise (and in those cases the > results even show speedup, which seems a bit surprising). It's somewhat > dependent on CPU too - on xeon there's hardly any regression. > > > 2) nextval-40-abort > > Here the slowdown is clear, but I'd argue it generally falls in the same > group as (1). Yes, I'd be happier if it didn't behave like this, but if > someone can show me a practical workload affected by this ... > > > 3) irrelevant cases [all the alters taking insane amounts of time] > > I absolutely refuse to care about these extreme cases where decoding > 100k transactions takes 5-10 minutes (on i5), or up to 30 minutes (on > xeon). If this was a problem for some practical workload, we'd have > already heard about it I guess. And even if there was such workload, it > wouldn't be up to this patch to fix that. There's clearly something > misbehaving in the snapshot builder. > > > I was hopeful the global hash table would be an improvement, but that > doesn't seem to be the case. I haven't done much profiling yet, but I'd > guess most of the overhead is due to ReorderBufferQueueSequence() > starting and aborting a transaction in the non-transactinal case. Which > is unfortunate, but I don't know if there's a way to optimize that. > Before discussing the alternative ideas you shared, let me try to clarify my understanding so that we are on the same page. I see two observations based on the testing and discussion we had (a) for non-transactional cases, the overhead observed is mainly due to starting/aborting a transaction for each change; (b) for transactional cases, we see overhead due to traversing all the top-level txns and check the hash table for each one to find whether change is transactional. Am, I missing something? -- With Regards, Amit Kapila.