Thread

  1. Re: [BUG] CRASH: ECPGprepared_statement() and ECPGdeallocate_all() when connection is NULL

    Andrew Dunstan <andrew@dunslane.net> — 2026-05-06T11:55:10Z

    On 2026-05-05 Tu 5:36 PM, Andrew Dunstan wrote:
    >
    > On 2026-05-05 Tu 4:32 PM, Tom Lane wrote:
    >> Alexander Lakhin <exclusion@gmail.com> writes:
    >>> There is no other useful information in the log, so it's not clear 
    >>> what's
    >>> wrong with that animal (no other gives us such failures), but I could
    >>> produce something similar (on FreeBSD and Linux) with:
    >>> echo "max_connections = 10" >/tmp/temp.config; 
    >>> TEMP_CONFIG=/tmp/temp.config gmake -s check -C src/interfaces/ecpg/test
    >> Yes, I can also reproduce problems with the ecpg tests at
    >> max_connections = 10.  For me, thread/prep segfaults but thread/alloc
    >> just seems to hang indefinitely.  (thread/prep sometimes does too.)
    >> These issues are not new; v18 does the same.  The reporting is a
    >> bit different but I think that's from pg_regress changes not ecpg.
    >>
    >> Looking at the postmaster log, I see
    >>
    >> 2026-05-05 16:11:06.509 EDT [682116] FATAL:  sorry, too many clients 
    >> already
    >>
    >> which is unsurprising in this situation, but apparently these tests
    >> don't handle a connection failure well at all.
    >>
    >> There's no such message in dikkop's log, so that may be an unrelated 
    >> problem.
    >>
    >> BTW, reducing max_connections to 5 causes several other tests to fail,
    >> but in unsurprising ways, like
    >>
    >> # +SQL error: could not connect to database "ecpg1_regression" on 
    >> line 107
    >> # +SQL error: could not connect to database "ecpg1_regression" on 
    >> line 107
    >> # +SQL error: could not connect to database "ecpg1_regression" on 
    >> line 107
    >> # +SQL error: could not connect to database "ecpg1_regression" on 
    >> line 107
    >>
    >>
    >>
    >
    >
    > Ugh. I will do some digging.
    >
    >
    >
    
    OK, first this is orthogonal to the issue fixed earlier in this thread.
    
    It's a 22 yer old bug where a connection failure results in a thread 
    falling back to a sibling's connection. The fix is to keep track of 
    which thread opened which connection and only fall back to the global 
    actual_connection if it was started by our thread.
    
    There was an unresolved report of these symptoms in 2006[1]. 
    Essentially, the user was holding the lock the docs told him to hold, 
    and ecpglib still corrupted state because the corruption window was 
    inside ecpglib, not inside the libpq calls he was trying to serialize. 
    It's not an easy problem to diagnose, however, so there could well have 
    been more cases.
    
    Attached is a patch with the fix, courtesy of claude. It's a slight 
    behaviour change:
    
    After the patch, programs in this category fail loudly with ECPG_NO_CONN:
    
    - Pattern that breaks: main thread calls EXEC SQL CONNECT, spawns 
    workers, workers issue EXEC SQL ... with no connection name and no 
    per-thread SET CONNECTION, relying on their own mutex to serialize the 
    libpq calls.
    - Migration: one of two existing supported patterns. Either name the 
    connection explicitly per statement (EXEC SQL AT con1 SELECT ...) or set 
    the per-thread default once at thread start (EXEC SQL SET CONNECTION 
    con1;). The
       latter still works under the patch because it explicitly populates 
    the per-thread slot, and the patch only owner-checks the global 
    fallback, not the per-thread slot.
    
    So I guess the question is whether or not we backpatch it (or some other 
    fix)?
    
    
    cheers
    
    
    andrew
    
    
    
    [1] 
    https://www.postgresql.org/message-id/52940eef0611100514p14c85e34l528da656662decc9@mail.gmail.com
    
    
    --
    Andrew Dunstan
    EDB: https://www.enterprisedb.com