Re: Recovering from detoast-related catcache invalidations
Andres Freund <andres@anarazel.de>
From: Andres Freund <andres@anarazel.de>
To: Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>
Cc: Heikki Linnakangas <hlinnaka@iki.fi>, Noah Misch <noah@leadboat.com>, Tom Lane <tgl@sss.pgh.pa.us>, Xiaoran Wang <fanfuxiaoran@gmail.com>, pgsql-hackers@lists.postgresql.org
Date: 2025-12-18T19:07:00Z
Lists: pgsql-hackers
Hi,
On 2025-09-29 19:34:47 +0300, Arseniy Mukhin wrote:
> On Fri, Sep 19, 2025 at 11:11 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2025-03-26 07:21:43 -0400, Andres Freund wrote:
> > > On 2025-01-14 15:13:21 +0200, Heikki Linnakangas wrote:
> > > > Committed with those fixes. Thanks for the review!
> > >
> > > The test doesn't seem entirely stable. E.g.
> > > https://cirrus-ci.com/task/6166374147424256
> > > failed spuriously:
> > >
> > > [08:52:06.822](0.002s) # issuing query 1 via background psql:
> > > # SELECT injection_points_set_local();
> > > # SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
> > > [08:52:06.851](0.029s) # results query 1:
> > > # {
> > > # 'stderr' => 'background_psql: QUERY_SEPARATOR 1:
> > > # ',
> > > # 'stdout' => '
> > > #
> > > # background_psql: QUERY_SEPARATOR 1:
> > > # '
> > > # }
> > > [08:52:06.893](0.042s) # issuing query 1 via background psql:
> > > # SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
> > > # SELECT injection_points_detach('catcache-list-miss-systable-scan-started');
> > > [08:52:06.897](0.004s) # pump_until: process terminated unexpectedly when searching for "(?^:(^|\n)background_psql: QUERY_SEPARATOR 1:\r?\n)" with stream: ""
> > > process ended prematurely at /tmp/cirrus-ci-build/src/test/perl/PostgreSQL/Test/Utils.pm line 440.
> > >
> > >
> > > 2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] ERROR: could not find injection point catcache-list-miss-systable-scan-started to wake up
> > > 2025-03-25 08:52:06.896 UTC [34240][client backend] [007_catcache_inval.pl][4/2:0] STATEMENT: SELECT injection_points_wakeup('catcache-list-miss-systable-scan-started');
> >
> > And again: https://cirrus-ci.com/task/6082321633247232
> >
> > Ping?
> >
>
> The wait_for_event call, which is typically used with a wait injection
> point, is missing. Could this be the cause of instability? If this
> makes sense, please find the attached fix.
I was just reminded of this thread because I saw the failure again:
https://cirrus-ci.com/task/5859971612540928
(it's unrelated to the patch)
I think you might be right - the wait point might not yet have been reached,
because the query_until() just waits for "starting_bg_psql" being printed by
\echo starting_bg_psql
SELECT foofunc(1);
while the wait point is only hit during the "SELECT foofunc(1)'. There's no
guarantee that we will have reached the wait point by this point.
I found that I can reproduce the issue with
--- i/src/test/modules/test_misc/t/007_catcache_inval.pl
+++ w/src/test/modules/test_misc/t/007_catcache_inval.pl
@@ -53,6 +53,7 @@ my $psql_session2 = $node->background_psql('postgres');
# catcache list
$psql_session->query_safe(
qq[
+ SELECT pg_sleep(0.1);
SELECT injection_points_set_local();
SELECT injection_points_attach('catcache-list-miss-systable-scan-started', 'wait');
]);
@@ -62,6 +63,7 @@ $psql_session->query_safe(
$psql_session->query_until(
qr/starting_bg_psql/, q(
\echo starting_bg_psql
+ SELECT pg_sleep(3);
SELECT foofunc(1);
));
(the first SELECT just is there to later avoid hitting the injection point, by
already having loaded the cache entry for pg_sleep).
And indeed your patch fixes that.
Greetings,
Andres Freund