Thread
-
Re: Flaky 003_start_stop.pl test
Andres Freund <andres@anarazel.de> — 2025-12-17T15:00:01Z
Hi, On 2025-12-17 16:46:25 +0200, Heikki Linnakangas wrote: > > The pattern is similar in other failed tests. There has been a spate of errors > > on postgres/postgres' CI, which made me look at this again. > > I see what's going on here. Before the loop that opens all the connections > with the SSLRequests, the test does this: > > > if (!$node->raw_connect_works()) > > { > > plan skip_all => "this test requires working raw_connect()"; > > } > > That's the connection you see in the log without the SSLRequest. > > When the test fails, what happens is that the backend handling that > connection is slow. It lingers around, even though the perl client has > already closed the socket. The backend finally exits _just_ when the test > has opened all the "dead end" connections, and checks that the final > connection fails with the "FATAL: sorry, too many clients already" error. > Because the backend exits at just the right moment, it frees up a connection > slot which is then used by that final connection. The test fails, because it > expects all the connection slots to be in use and for that final connection > to launch a dead-end backend. > > The 002_connection_limits.pl test has potential for a similar issue. I added > a "sleep(1)" to proc_exit to simulate slow backend exit, and the test > started failing, because the "safe_psql" calls used in the initialization > was still consuming a connection slot. > > I've pushed a fix to both tests, by restarting the server after the > initialization steps. That's a little heavy-weight, but it's a simple way to > ensure that all the backends are gone. Thanks! Let's hope that this makes cfbot run a bit more quietly aga<oh no>. The test run for this commit failed with an independent flakiness... [4] Greetings, Andres Freund [4] https://cirrus-ci.com/task/6281872222715904