Thread

Re: [BUG] CRASH: ECPGprepared_statement() and ECPGdeallocate_all() when connection is NULL

Andrew Dunstan <andrew@dunslane.net> — 2026-05-06T11:55:10Z
On 2026-05-05 Tu 5:36 PM, Andrew Dunstan wrote:
>
> On 2026-05-05 Tu 4:32 PM, Tom Lane wrote:
>> Alexander Lakhin <exclusion@gmail.com> writes:
>>> There is no other useful information in the log, so it's not clear 
>>> what's
>>> wrong with that animal (no other gives us such failures), but I could
>>> produce something similar (on FreeBSD and Linux) with:
>>> echo "max_connections = 10" >/tmp/temp.config; 
>>> TEMP_CONFIG=/tmp/temp.config gmake -s check -C src/interfaces/ecpg/test
>> Yes, I can also reproduce problems with the ecpg tests at
>> max_connections = 10.  For me, thread/prep segfaults but thread/alloc
>> just seems to hang indefinitely.  (thread/prep sometimes does too.)
>> These issues are not new; v18 does the same.  The reporting is a
>> bit different but I think that's from pg_regress changes not ecpg.
>>
>> Looking at the postmaster log, I see
>>
>> 2026-05-05 16:11:06.509 EDT [682116] FATAL:  sorry, too many clients 
>> already
>>
>> which is unsurprising in this situation, but apparently these tests
>> don't handle a connection failure well at all.
>>
>> There's no such message in dikkop's log, so that may be an unrelated 
>> problem.
>>
>> BTW, reducing max_connections to 5 causes several other tests to fail,
>> but in unsurprising ways, like
>>
>> # +SQL error: could not connect to database "ecpg1_regression" on 
>> line 107
>> # +SQL error: could not connect to database "ecpg1_regression" on 
>> line 107
>> # +SQL error: could not connect to database "ecpg1_regression" on 
>> line 107
>> # +SQL error: could not connect to database "ecpg1_regression" on 
>> line 107
>>
>>
>>
>
>
> Ugh. I will do some digging.
>
>
>

OK, first this is orthogonal to the issue fixed earlier in this thread.

It's a 22 yer old bug where a connection failure results in a thread 
falling back to a sibling's connection. The fix is to keep track of 
which thread opened which connection and only fall back to the global 
actual_connection if it was started by our thread.

There was an unresolved report of these symptoms in 2006[1]. 
Essentially, the user was holding the lock the docs told him to hold, 
and ecpglib still corrupted state because the corruption window was 
inside ecpglib, not inside the libpq calls he was trying to serialize. 
It's not an easy problem to diagnose, however, so there could well have 
been more cases.

Attached is a patch with the fix, courtesy of claude. It's a slight 
behaviour change:

After the patch, programs in this category fail loudly with ECPG_NO_CONN:

- Pattern that breaks: main thread calls EXEC SQL CONNECT, spawns 
workers, workers issue EXEC SQL ... with no connection name and no 
per-thread SET CONNECTION, relying on their own mutex to serialize the 
libpq calls.
- Migration: one of two existing supported patterns. Either name the 
connection explicitly per statement (EXEC SQL AT con1 SELECT ...) or set 
the per-thread default once at thread start (EXEC SQL SET CONNECTION 
con1;). The
   latter still works under the patch because it explicitly populates 
the per-thread slot, and the patch only owner-checks the global 
fallback, not the per-thread slot.

So I guess the question is whether or not we backpatch it (or some other 
fix)?


cheers


andrew



[1] 
https://www.postgresql.org/message-id/52940eef0611100514p14c85e34l528da656662decc9@mail.gmail.com


--
Andrew Dunstan
EDB: https://www.enterprisedb.com