Re: Add 64-bit XIDs into PostgreSQL 15

Pavel Borisov <pashkin.elfe@gmail.com>

From: Pavel Borisov <pashkin.elfe@gmail.com>

To: Andres Freund <andres@anarazel.de>

Cc: Ilya Anfimov <ilan@tzirechnoy.com>, Postgres hackers <pgsql-hackers@lists.postgresql.org>, pgsql-hackers <pgsql-hackers@postgresql.org>

Date: 2022-02-02T15:10:23Z

Lists: pgsql-hackers

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →

Add SLRU tests for 64-bit page case
- a60b8a58f435 17.0 landed
Make use FullTransactionId in 2PC filenames
- 5a1dfde8334b 17.0 landed
Use larger segment file names for pg_notify
- 2cdf131c46e6 17.0 landed
Index SLRUs by 64-bit integers rather than by 32-bit integers
- 4ed8f0913bfd 17.0 landed

Attachments

v8-0003-README.XID64.patch (application/octet-stream) patch v8-0003
v8-0001-64-bit-GUCs.patch (application/octet-stream) patch v8-0001
v8-0002-Add-64bit-xid.patch (application/octet-stream) patch v8-0002

Hi, Andres!

I've revised the README a little bit to address your corrections and
questions. Thanks for this very much!
A patchset with changed README is attached as v8 here (the code is
unchanged and identical to v7).


> > +The downside of this is that we can not use tuple's XMIN and XMAX right
> away.
> > +We often need to re-read t_xmin and t_xmax - which could actually be
> pointers
> > +into a page in shared buffers and therefore they could be updated by
> any other
> > +backend.
>
> Ugh, that's not great.
>
Agree. This part is one of the candidates for revision as per proposals
above [1] i.e :
"2A. Probably refactor it to store precalculated XMIN/XMAX in memory
tuple representation instead of t_xid_base/t_multi_base".

We are working on this change.


> What happens if the first access happens on a replica?
>
> What is the approach for dealing with multixact files? They have xids
> embedded?  And currently the SLRUs will break if you just let the offsets
> SLRU
> grow without bounds.
>
> Wait. So you just modify the page without WAL logging or marking it dirty
> on a
> standby? I fail to see how that can be correct.
>
> Imagine the cluster is promoted, the page is dirtied, and we write it
> out. You'll have written out a completely changed page, without any WAL
> logging. There's plenty other scenarios.
>
In this part, I suppose you've found a definite bug. Thanks! There are a
couple
of ways how it could be fixed:

1. If we enforce checkpoint at replica promotion then we force full-page
writes after each page modification afterward.

2. Maybe it's worth using BufferDesc bit to mark the page as converted to
64xid but not yet written to disk? For example, one of four bits from
BUF_USAGECOUNT.
BM_MAX_USAGE_COUNT  = 5 so it will be enough 3 bits to store it. This will
change in-memory page representation but will not need WAL-logging which is
impossible on a replica.

What do you think about it?

[1]
https://www.postgresql.org/message-id/CALT9ZEHy9yFQEwptCUznPLciqM9ZSs91yTnNSSiG22m%3DBgCpNA%40mail.gmail.com