Thread
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Reset btpo_cycleid in nbtree VACUUM's REDO routine.
- da9517fb3a09 18.0 landed
-
nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Peter Geoghegan <pg@bowt.ie> — 2024-11-15T16:33:37Z
Attached patch teaches btree_xlog_vacuum, nbtree VACUUM's REDO routine, to reset the target page's opaque->btpo_cycleid to 0. This makes the REDO routine match original execution, which seems like a good idea on consistency grounds. I propose this for the master branch only. -- Peter Geoghegan
-
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Michael Paquier <michael@paquier.xyz> — 2024-11-20T06:32:11Z
On Fri, Nov 15, 2024 at 11:33:37AM -0500, Peter Geoghegan wrote: > Attached patch teaches btree_xlog_vacuum, nbtree VACUUM's REDO > routine, to reset the target page's opaque->btpo_cycleid to 0. This > makes the REDO routine match original execution, which seems like a > good idea on consistency grounds. > > I propose this for the master branch only. Perhaps this patch should also explain why this is better this way? This path does not manipulate btpo_level, for one. -- Michael
-
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Andrey Borodin <x4mmm@yandex-team.ru> — 2024-11-20T09:40:09Z
> On 15 Nov 2024, at 21:33, Peter Geoghegan <pg@bowt.ie> wrote: > > Attached patch teaches btree_xlog_vacuum, nbtree VACUUM's REDO > routine, to reset the target page's opaque->btpo_cycleid to 0. This > makes the REDO routine match original execution, which seems like a > good idea on consistency grounds. > > I propose this for the master branch only. The change seems correct to me: anyway cycle must be less than cycle of any future vacuum after promotion. I cannot say anything about beauty of resetting or not resetting the field. I'd suggest renaming the field into something like "btpo_split_vaccycleid". I was aware of index vacuum backtracking, but it took me a while to build context again. Best regards, Andrey Borodin.
-
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Peter Geoghegan <pg@bowt.ie> — 2024-12-23T20:46:49Z
On Wed, Nov 20, 2024 at 4:41 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: > > On 15 Nov 2024, at 21:33, Peter Geoghegan <pg@bowt.ie> wrote: > > I propose this for the master branch only. > > The change seems correct to me: anyway cycle must be less than cycle of any future vacuum after promotion. The cycles set in the page special area during page splits that happen to run while a VACUUM also runs must use that same VACUUM's cycle ID (which is stored in shared memory for the currently running VACUUM). That way the VACUUM will know when it must backtrack later on, to avoid missing index tuples that it is expected to remove. It doesn't matter if the cycle_id that VACUUM sees is less than or greater than its own one -- only that it matches its own one when it needs to match to get correct behavior from VACUUM. (Though it's also possible to get a false positive, in rare cases where we get unlucky and there's a collision. This might waste cycles within VACUUM, but it shouldn't lead to truly incorrect behavior.) > I cannot say anything about beauty of resetting or not resetting the field. The main reason to do this is simple: we should make REDO routines match original execution, unless there is a very good reason not to do so -- original execution is "authoritative". There's no good reason not to do so here. There is a secondary reason to do things this way, though: resetting the cycle ID within the REDO routine can save future work from within btvacuumpage, after a standby gets promoted, when nbtree VACUUM runs against an affected index for the first time. Note that nbtree VACUUM goes out of its way to clear an old cycle ID in passing, even when it has nothing else to do on the page. And so by taking care of that on the REDO side, right from the start, the newly promoted standby won't need to dirty so many pages during the first nbtree VACUUM after it is promoted. (Actually, it's worse than that -- we're not just dirtying pages these days. It's quite likely that we'll need to write a WAL record to clear the cycle ID, since page-level checksums are enabled by default.) > I'd suggest renaming the field into something like "btpo_split_vaccycleid". I was aware of index vacuum backtracking, but it took me a while to build context again. I'm not inclined to change it now. It's been like this for many years now. Pushed this patch just now. Thanks for the review! -- Peter Geoghegan
-
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Peter Geoghegan <pg@bowt.ie> — 2024-12-23T20:47:35Z
On Wed, Nov 20, 2024 at 1:32 AM Michael Paquier <michael@paquier.xyz> wrote: > Perhaps this patch should also explain why this is better this way? See the commit message, and my remarks to Andrey just now. > This path does not manipulate btpo_level, for one. Why would it do so? Changing the btree level would corrupt the index? -- Peter Geoghegan
-
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Andrey Borodin <x4mmm@yandex-team.ru> — 2025-11-18T06:32:19Z
> On 24 Dec 2024, at 01:46, Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Nov 20, 2024 at 4:41 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote: >>> On 15 Nov 2024, at 21:33, Peter Geoghegan <pg@bowt.ie> wrote: >>> I propose this for the master branch only. >> >> The change seems correct to me: anyway cycle must be less than cycle of any future vacuum after promotion. > > The cycles set in the page special area during page splits that happen > to run while a VACUUM also runs must use that same VACUUM's cycle ID > (which is stored in shared memory for the currently running VACUUM). > That way the VACUUM will know when it must backtrack later on, to > avoid missing index tuples that it is expected to remove. > > It doesn't matter if the cycle_id that VACUUM sees is less than or > greater than its own one -- only that it matches its own one when it > needs to match to get correct behavior from VACUUM. (Though it's also > possible to get a false positive, in rare cases where we get unlucky > and there's a collision. This might waste cycles within VACUUM, but > it shouldn't lead to truly incorrect behavior.) I'm thinking more about it. We always reset btpo_cycleid even in redo of a split. This "btpo_cycleid = 0;" reset can break two scenarios that are not currently supported by us, but might be supported in future. This reset is based on the idea that crash recovery will interrupt vacuum. It is not true in these cases. 1. We are dealing with compute-storage separation system. We do not have filesystem and when we need to read a page we get it from some storage service, that rebuild pages from WAL. (e.g. Aurora and Neon) If we split a page during vacuum, evict it and read it from service - we will miss needed backtrack to the left page... 2. There's a tool for repairing pages with checksum violations - page repair. AFAIK it can request page from Standby, and if it does amidst vacuum, vacuum can get false negative for backtracking logic. Thanks! Best regards, Andrey Borodin.
-
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Peter Geoghegan <pg@bowt.ie> — 2025-11-18T18:54:11Z
On Tue, Nov 18, 2025 at 1:32 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > I'm thinking more about it. We always reset btpo_cycleid even in redo of a split. > This "btpo_cycleid = 0;" reset can break two scenarios that are not currently supported by us, but might be supported in future. I don't follow. > This reset is based on the idea that crash recovery will interrupt vacuum. It is not true in these cases. It's also based on the idea that only one VACUUM operation can be running at a time. > 1. We are dealing with compute-storage separation system. We do not have filesystem and when we need to read a page we get it from some storage service, that rebuild pages from WAL. (e.g. Aurora and Neon) If we split a page during vacuum, evict it and read it from service - we will miss needed backtrack to the left page... Are you arguing that the xl_btree_split record should include the cycleid? I see that systems that are built on this architecture do something along these lines: https://github.com/neondatabase/postgres/commit/a9b92820c5d14dbff8f59ab65ffdaae92ab9c3c8 However, that seems well out of scope for core Postgres. At least for the foreseeable future. -- Peter Geoghegan
-
Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Andrey Borodin <x4mmm@yandex-team.ru> — 2025-11-19T04:36:19Z
> On 18 Nov 2025, at 23:54, Peter Geoghegan <pg@bowt.ie> wrote: > > Are you arguing that the xl_btree_split record should include the cycleid? Yes. > I see that systems that are built on this architecture do something > along these lines: > https://github.com/neondatabase/postgres/commit/a9b92820c5d14dbff8f59ab65ffdaae92ab9c3c8 Thanks for the link. This solution seems surprisingly complicated. > However, that seems well out of scope for core Postgres. At least for > the foreseeable future. Yes, I agree. It's just a case when redo routines do not match original execution, and it seemed interesting to me. I understand that for core Postgres it's extra overhead in WAL record only for a beautification. Thanks! Best regards, Andrey Borodin.