Re: Add 64-bit XIDs into PostgreSQL 15
Pavel Borisov <pashkin.elfe@gmail.com>
From: Pavel Borisov <pashkin.elfe@gmail.com>
To: Andres Freund <andres@anarazel.de>
Cc: Ilya Anfimov <ilan@tzirechnoy.com>, Postgres hackers <pgsql-hackers@lists.postgresql.org>, pgsql-hackers <pgsql-hackers@postgresql.org>
Date: 2022-02-02T15:10:23Z
Lists: pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Add SLRU tests for 64-bit page case
- a60b8a58f435 17.0 landed
-
Make use FullTransactionId in 2PC filenames
- 5a1dfde8334b 17.0 landed
-
Use larger segment file names for pg_notify
- 2cdf131c46e6 17.0 landed
-
Index SLRUs by 64-bit integers rather than by 32-bit integers
- 4ed8f0913bfd 17.0 landed
Attachments
- v8-0003-README.XID64.patch (application/octet-stream) patch v8-0003
- v8-0001-64-bit-GUCs.patch (application/octet-stream) patch v8-0001
- v8-0002-Add-64bit-xid.patch (application/octet-stream) patch v8-0002
Hi, Andres! I've revised the README a little bit to address your corrections and questions. Thanks for this very much! A patchset with changed README is attached as v8 here (the code is unchanged and identical to v7). > > +The downside of this is that we can not use tuple's XMIN and XMAX right > away. > > +We often need to re-read t_xmin and t_xmax - which could actually be > pointers > > +into a page in shared buffers and therefore they could be updated by > any other > > +backend. > > Ugh, that's not great. > Agree. This part is one of the candidates for revision as per proposals above [1] i.e : "2A. Probably refactor it to store precalculated XMIN/XMAX in memory tuple representation instead of t_xid_base/t_multi_base". We are working on this change. > What happens if the first access happens on a replica? > > What is the approach for dealing with multixact files? They have xids > embedded? And currently the SLRUs will break if you just let the offsets > SLRU > grow without bounds. > > Wait. So you just modify the page without WAL logging or marking it dirty > on a > standby? I fail to see how that can be correct. > > Imagine the cluster is promoted, the page is dirtied, and we write it > out. You'll have written out a completely changed page, without any WAL > logging. There's plenty other scenarios. > In this part, I suppose you've found a definite bug. Thanks! There are a couple of ways how it could be fixed: 1. If we enforce checkpoint at replica promotion then we force full-page writes after each page modification afterward. 2. Maybe it's worth using BufferDesc bit to mark the page as converted to 64xid but not yet written to disk? For example, one of four bits from BUF_USAGECOUNT. BM_MAX_USAGE_COUNT = 5 so it will be enough 3 bits to store it. This will change in-memory page representation but will not need WAL-logging which is impossible on a replica. What do you think about it? [1] https://www.postgresql.org/message-id/CALT9ZEHy9yFQEwptCUznPLciqM9ZSs91yTnNSSiG22m%3DBgCpNA%40mail.gmail.com