Thread
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Fix amcheck's handling of half-dead B-tree pages
- fbb4b607832d 14.21 landed
- 7792bdc45860 15.16 landed
- 1829016268c3 16.12 landed
- e8ae594458a3 17.8 landed
- 19e786727c4f 18.2 landed
- cbe04e5d729f 19 (unreleased) landed
-
Fix amcheck's handling of incomplete root splits in B-tree
- a4cb21ea9cd7 14.21 landed
- 721b58fbec5f 15.16 landed
- f2a6df9fd56d 16.12 landed
- 5a2d1df00718 17.8 landed
- 50c63ebb05fc 18.2 landed
- 6c05ef5729c0 19 (unreleased) landed
-
Add a test for half-dead pages in B-tree indexes
- c085aab27819 19 (unreleased) landed
-
Add a test for incomplete splits in B-tree indexes
- 1e4e5783e7d7 19 (unreleased) landed
-
Improve checking of child pages in contrib/amcheck.
- d114cc538715 13.0 cited
-
Bug in amcheck?
Konstantin Knizhnik <knizhnik@garret.ru> — 2025-10-22T16:29:51Z
Hi hackers. We see the following error reported by amcheck (I have added dump of opaque) when it interleaves with autovacuum and cancel pt: ERROR: mismatch between parent key and child high key in index "pg_attribute_relid_attnam_index" DETAIL: Target block=274, target opaque->flags=0, child block=427, child opaque=11, target page lsn=1/484A8FC8. CONTEXT: SQL statement "SELECT bt_index_parent_check(indexrelid, true, true) from pg_index" So child has BTP_HALF_DEAD bit set. Autovacuum is interrupted in this place in _bt_pagedel: /* * Check here, as calling loops will have locks held, preventing * interrupts from being processed. */ CHECK_FOR_INTERRUPTS(); Reproducing it is not so easy. First of all I added sleep here: /* * Check here, as calling loops will have locks held, preventing * interrupts from being processed. */ pg_usleep(10000); CHECK_FOR_INTERRUPTS(); Then I create two procedures: create or replace procedure create_tables(tables integer, partitions integer) as $$ declare i integer; j integer; begin for i in 1..tables loop execute 'DROP TABLE IF EXISTS t_' || i; execute 'CREATE TABLE t_' || i || '(pk integer) partition by range (pk)'; for j in 1..partitions loop execute 'create table p_'||i||'_'||j||' partition of t_'||i||' for values from ('||j||') to ('||(j + 1)||')'; end loop; execute 'insert into t_'||i||' values (generate_series(1,'||partitions||'))'; end loop; end; $$ language plpgsql; and create or replace procedure run_amcheck() as $$ begin loop if (select count(*) from pg_stat_activity where backend_type='autovacuum worker') > 0 then raise notice 'Run amcheck!'; perform bt_index_parent_check(indexrelid, true, true) from pg_index; end if; perform pg_sleep(1); end loop; end; $$ language plpgsql; Then I run concurrently run_amcheck() and the following script for pgbench: call create_tables(2,1000); select pg_sleep(2); If the problem is not reproduced, then cancel run_amcheck() and restart it once again. Backtrace (pg16) is the following: * frame #0: 0x00000001017b6aac amcheck.dylib`bt_child_highkey_check(state=0x000000010c846318, target_downlinkoffnum=37, loaded_child="\U00000001", target_level=1) at verify_nbtree.c:2146:23 frame #1: 0x00000001017b7fd8 amcheck.dylib`bt_child_check(state=0x000000010c846318, targetkey=0x000000013c01c448, downlinkoffnum=37) at verify_nbtree.c:2262:2 frame #2: 0x00000001017b5f4c amcheck.dylib`bt_target_page_check(state=0x000000010c846318) at verify_nbtree.c:1623:4 frame #3: 0x00000001017b3908 amcheck.dylib`bt_check_level_from_leftmost(state=0x000000010c846318, level=(level = 1, leftmost = 3, istruerootlevel = false)) at verify_nbtree.c:859:3 frame #4: 0x00000001017b24e8 amcheck.dylib`bt_check_every_level(rel=0x0000000140074f18, heaprel=0x0000000130070148, heapkeyspace=true, readonly=true, heapallindexed=true, rootdescend=true) at verify_nbtree.c:603:13 frame #5: 0x00000001017b198c amcheck.dylib`bt_index_check_internal(indrelid=2674, parentcheck=true, heapallindexed=true, rootdescend=true) at verify_nbtree.c:362:3 frame #6: 0x00000001017b1a78 amcheck.dylib`bt_index_parent_check(fcinfo=0x000000010c83b040) at verify_nbtree.c:242:2 I wonder if we should add P_ISHALFDEAD(opaque) for child page? -
Re: Bug in amcheck?
Mikhail Nikalayeu <mihailnikalayeu@gmail.com> — 2025-11-02T12:27:00Z
Hello! > I wonder if we should add P_ISHALFDEAD(opaque) for child page? I am not a btree expert, but things I was able to find so far: In commit d114cc538715e14d29d6de8b6ea1a1d5d3e0edb4 next check is added: > bt_child_highkey_check(state, downlinkoffnum, > child, topaque->btpo_level); At the same time there is a comment below: > * We go ahead with our checks if the child page is half-dead. It's safe > * to do so because we do not test the child's high key, so it does not > * matter that the original high key will have been replaced by a dummy > * truncated high key within _bt_mark_page_halfdead(). All other page > * items are left intact on a half-dead page, so there is still something > * to test. So, yes, it looks like we need to skip the child's high key test for half-dead pages. BWT, have you tried to create an injection_point-based reproducer? Best regards, Mikhail.
-
Re: Bug in amcheck?
Konstantin Knizhnik <knizhnik@garret.ru> — 2025-11-18T13:59:36Z
On 02/11/2025 2:27 PM, Mihail Nikalayeu wrote: > Hello! > >> I wonder if we should add P_ISHALFDEAD(opaque) for child page? > I am not a btree expert, but things I was able to find so far: > > In commit d114cc538715e14d29d6de8b6ea1a1d5d3e0edb4 next check is added: > >> bt_child_highkey_check(state, downlinkoffnum, >> child, topaque->btpo_level); > At the same time there is a comment below: > >> * We go ahead with our checks if the child page is half-dead. It's safe >> * to do so because we do not test the child's high key, so it does not >> * matter that the original high key will have been replaced by a dummy >> * truncated high key within _bt_mark_page_halfdead(). All other page >> * items are left intact on a half-dead page, so there is still something >> * to test. > So, yes, it looks like we need to skip the child's high key test for > half-dead pages. > > BWT, have you tried to create an injection_point-based reproducer? > > Best regards, > Mikhail. Hello Mikhail, Thank you very much for looking at this issue. And I am very sorry for delay with answer. Unfortunately I was not able to reproduce the problem for the latest Postgres: neither with injection points, neither with my original approach with sleeps. Originally I investigated the customer's problem with PG16. And have reproduced it for pg16,. I checked that relevant amcheck code was not changed since pg16, so I thought that the problem takes place for all Postgres versions. But looks like it is not true.
-
Re: Bug in amcheck?
Mikhail Nikalayeu <mihailnikalayeu@gmail.com> — 2025-11-18T22:19:00Z
Hello! > Originally I investigated the customer's problem with PG16. And have > reproduced it for pg16,. I checked that relevant amcheck code was not > changed since pg16, so I thought that the problem takes place for all > Postgres versions. But looks like it is not true. I think it is still broken, but with less probability. Have you tried injection points on v16? Such a test case will make things much more clear. Also, I added Alexander to CC (he is author of bt_child_highkey_check) - maybe the issue is easily understandable for him. Best regards, Mikhail.
-
Re: Bug in amcheck?
Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-25T20:03:17Z
On 19/11/2025 00:19, Mihail Nikalayeu wrote: > Hello! > >> Originally I investigated the customer's problem with PG16. And have >> reproduced it for pg16,. I checked that relevant amcheck code was not >> changed since pg16, so I thought that the problem takes place for all >> Postgres versions. But looks like it is not true. > > I think it is still broken, but with less probability. > Have you tried injection points on v16? Such a test case will make > things much more clear. Konstantin's original repro involved autovacuum and concurrent sessions. I was confused by that, because bt_index_parent_check() holds a ShareLock, which prevents it from running concurrently with vacuum. But this isn't a race condition as such, the issue arises whenever there is a half-dead page in the index. To end up with a half-dead page, you need to gracefully cancel/interrupt autovacuum while it's deleting a page. The repro's way of canceling autovacuum was very complicated. I didn't fully understand it, but I think the concurrent dropping/creating of tables would sometimes cause autovauum to be canceled. Here's a much more straightforward repro. Apply this little patch: diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c index 30b43a4dd18..c132fc90277 100644 --- a/src/backend/access/nbtree/nbtpage.c +++ b/src/backend/access/nbtree/nbtpage.c @@ -2353,6 +2353,12 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno, * Check here, as calling loops will have locks held, preventing * interrupts from being processed. */ + if (random() < INT32_MAX / 2) + { + elog(ERROR, "aborting page deletion"); + } + else + elog(NOTICE, "unlinking halfdead page %u %u", BufferGetBlockNumber(leafbuf), scanblkno); CHECK_FOR_INTERRUPTS(); /* Unlink the current top parent of the subtree */ Then run this: postgres=# CREATE TABLE amchecktest (id int4); CREATE TABLE postgres=# INSERT INTO amchecktest SELECT g from generate_series(1, 1000000) g; INSERT 0 1000000 postgres=# CREATE INDEX on amchecktest (id); CREATE INDEX postgres=# DELETE FROM amchecktest WHERE id > 100000 AND id < 120000; DELETE 19999 postgres=# -- this will hit the error added by the patch VACUUM amchecktest; ERROR: aborting page deletion CONTEXT: while vacuuming index "amchecktest_id_idx" of relation "public.amchecktest" postgres=# select bt_index_parent_check('amchecktest_id_idx'); ERROR: mismatch between parent key and child high key in index "amchecktest_id_idx" DETAIL: Target block=3 child block=276 target page lsn=0/6ED0DB68. To fix this, I guess we need to teach bt_index_parent_check() about half-dead pages. Anyone volunteer to write that patch? - Heikki -
Re: Bug in amcheck?
Peter Geoghegan <pg@bowt.ie> — 2025-11-25T20:51:06Z
On Tue, Nov 25, 2025 at 3:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > To fix this, I guess we need to teach bt_index_parent_check() about > half-dead pages. Anyone volunteer to write that patch? It's not like bt_index_parent_check doesn't generally know about them. For example, bt_downlink_missing_check goes to great lengths to distinguish between legitimate "missing" downlinks caused by an interrupted page deletion, and real missing downlinks caused by corruption. The problem we're seeing here seems likely limited to code added by commit d114cc53, which enhanced bt_index_parent_check by adding the new bt_child_highkey_check check. bt_child_highkey_check actually reuses bt_downlink_missing_check (which deals with half-dead pages correctly), but still isn't careful enough about half-dead pages. This is kind of surprising, since it *does* account for incomplete splits, which are similar. In short, I think that we need to track something like BtreeCheckState.previncompletesplit, but for half-dead pages. And then actually use that within bt_child_highkey_check, to avoid spurious false-positive reports of corruption. -- Peter Geoghegan
-
Re: Bug in amcheck?
Heikki Linnakangas <hlinnaka@iki.fi> — 2025-11-26T08:20:40Z
On 25/11/2025 22:51, Peter Geoghegan wrote: > On Tue, Nov 25, 2025 at 3:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> To fix this, I guess we need to teach bt_index_parent_check() about >> half-dead pages. Anyone volunteer to write that patch? > > It's not like bt_index_parent_check doesn't generally know about them. > For example, bt_downlink_missing_check goes to great lengths to > distinguish between legitimate "missing" downlinks caused by an > interrupted page deletion, and real missing downlinks caused by > corruption. > > The problem we're seeing here seems likely limited to code added by > commit d114cc53, which enhanced bt_index_parent_check by adding the > new bt_child_highkey_check check. bt_child_highkey_check actually > reuses bt_downlink_missing_check (which deals with half-dead pages > correctly), but still isn't careful enough about half-dead pages. This > is kind of surprising, since it *does* account for incomplete splits, > which are similar. +1. It took me a moment to understand bt_downlink_missing_check(). The comments there talk about interrupted page deletions and incomplete splits, but it's actually never called for half-dead pages. The caller checks !P_IGNORE(opaque), and P_IGNORE means BTP_DELETED | BTP_HALF_DEAD. I don't think BTP_DELETED pages should be reachable here at all. So perhaps we should specifically check BTP_DELETED and give an ERROR on those. And for clarity, perhaps move the check for BTP_HALF_DEAD into bt_downlink_missing_check(), alongside the incomplete-split check, so that both cases would be checked at the same place. Just for clarity, it's not wrong as it is. > In short, I think that we need to track something like > BtreeCheckState.previncompletesplit, but for half-dead pages. And then > actually use that within bt_child_highkey_check, to avoid spurious > false-positive reports of corruption. I think it's even simpler than that, and we can just do this: --- a/contrib/amcheck/verify_nbtree.c +++ b/contrib/amcheck/verify_nbtree.c @@ -2268,7 +2268,7 @@ bt_child_highkey_check(BtreeCheckState *state, * If we visit page with high key, check that it is equal to the * target key next to corresponding downlink. */ - if (!rightsplit && !P_RIGHTMOST(opaque)) + if (!rightsplit && !P_RIGHTMOST(opaque) && !P_ISHALFDEAD(opaque)) { BTPageOpaque topaque; IndexTuple highkey; Both BTP_HALF_DEAD and BTP_INCOMPLETE_SPLIT indicate that a downlink is missing, but they work slightly differently. If a page is marked as BTP_INCOMPLETE_SPLIT, it means that its right sibling has no downlink, but if a page is marked as BTP_HALF_DEAD, it means that the page itself has no downlink. - Heikki -
Re: Bug in amcheck?
Heikki Linnakangas <hlinnaka@iki.fi> — 2025-12-01T21:20:16Z
On 26/11/2025 10:20, Heikki Linnakangas wrote: > On 25/11/2025 22:51, Peter Geoghegan wrote: >> In short, I think that we need to track something like >> BtreeCheckState.previncompletesplit, but for half-dead pages. And then >> actually use that within bt_child_highkey_check, to avoid spurious >> false-positive reports of corruption. > > I think it's even simpler than that, and we can just do this: > > --- a/contrib/amcheck/verify_nbtree.c > +++ b/contrib/amcheck/verify_nbtree.c > @@ -2268,7 +2268,7 @@ bt_child_highkey_check(BtreeCheckState *state, > * If we visit page with high key, check that it is equal to the > * target key next to corresponding downlink. > */ > - if (!rightsplit && !P_RIGHTMOST(opaque)) > + if (!rightsplit && !P_RIGHTMOST(opaque) && !P_ISHALFDEAD(opaque)) > { > BTPageOpaque topaque; > IndexTuple highkey; > > > Both BTP_HALF_DEAD and BTP_INCOMPLETE_SPLIT indicate that a downlink is > missing, but they work slightly differently. If a page is marked as > BTP_INCOMPLETE_SPLIT, it means that its right sibling has no downlink, > but if a page is marked as BTP_HALF_DEAD, it means that the page itself > has no downlink. Ok, here's a proper patch with tests. The patch itself is the above one-liner. It's in patch 0004. While testing this, I bumped into another similar amcheck bug: if the root page split is interrupted, verify_btree() complains: ERROR: block 3 is not true root in index "nbtree_incomplete_splits_i_idx" Attached patch 0002 contains a fix and a test for that. The fix for that is also one-liner. Summary of the patches: Patch 0001 adds an injection point test for incomplete splits. We already had such a test for GIN, which handles incomplete splits the same way as B-tree. I copy-pasted and adapted the GIN test for B-tree. This was an easy way to increase our test coverage. Patch 0002 fixes the incomplete-root-split bug in amcheck. It modifies the test added in patch 0001 to cover the bug fix. Patch 0003 adds a test for half-dead pages, similar to what 0001 did for incomplete splits. Patch 0004 fixes the bogus half-deaf-page error in amcheck, i.e. the issue that started this thread. It modifies the test introduced in patch 0003 to add amcheck calls, to cover the bug fix. - Heikki -
Re: Bug in amcheck?
Peter Geoghegan <pg@bowt.ie> — 2025-12-02T17:59:19Z
On Mon, Dec 1, 2025 at 4:20 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Ok, here's a proper patch with tests. The patch itself is the above > one-liner. It's in patch 0004. > > While testing this, I bumped into another similar amcheck bug: if the > root page split is interrupted, verify_btree() complains: > > ERROR: block 3 is not true root in index "nbtree_incomplete_splits_i_idx" > > Attached patch 0002 contains a fix and a test for that. The fix for that > is also one-liner. Good catch. > Summary of the patches: > > Patch 0001 adds an injection point test for incomplete splits. We > already had such a test for GIN, which handles incomplete splits the > same way as B-tree. I copy-pasted and adapted the GIN test for B-tree. > This was an easy way to increase our test coverage. > > Patch 0002 fixes the incomplete-root-split bug in amcheck. It modifies > the test added in patch 0001 to cover the bug fix. > > Patch 0003 adds a test for half-dead pages, similar to what 0001 did for > incomplete splits. > > Patch 0004 fixes the bogus half-deaf-page error in amcheck, i.e. the > issue that started this thread. It modifies the test introduced in patch > 0003 to add amcheck calls, to cover the bug fix. All seem reasonable. These tests will increase nbtree code coverage quite a bit, which is a nice bonus. -- Peter Geoghegan
-
Re: Bug in amcheck?
Heikki Linnakangas <hlinnaka@iki.fi> — 2025-12-02T19:27:09Z
On 02/12/2025 19:59, Peter Geoghegan wrote: > On Mon, Dec 1, 2025 at 4:20 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> Summary of the patches: >> >> Patch 0001 adds an injection point test for incomplete splits. We >> already had such a test for GIN, which handles incomplete splits the >> same way as B-tree. I copy-pasted and adapted the GIN test for B-tree. >> This was an easy way to increase our test coverage. >> >> Patch 0002 fixes the incomplete-root-split bug in amcheck. It modifies >> the test added in patch 0001 to cover the bug fix. >> >> Patch 0003 adds a test for half-dead pages, similar to what 0001 did for >> incomplete splits. >> >> Patch 0004 fixes the bogus half-deaf-page error in amcheck, i.e. the >> issue that started this thread. It modifies the test introduced in patch >> 0003 to add amcheck calls, to cover the bug fix. > > All seem reasonable. > > These tests will increase nbtree code coverage quite a bit, which is a > nice bonus. Committed, thanks for the review! - Heikki