Thread

  1. Re: Accept recovery conflict interrupt on blocked writing

    Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> — 2025-01-16T13:24:41Z

    Bonjour Thomas,
    
    On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:
    > Right.  Before commit 0da096d7 in v17, the recovery conflict code
    > running in a signal handler would have set ProcDiePending, so this
    > looks like an unintended regression due to that commit.
    
    The issue is happening on instances with 14.13. Though I haven't tried
    to reproduce it yet on a version older than the current HEAD yet.
    
    > In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
    > it could also service other interrupts that we probably don't want to
    > introduce without more study, no?  For example if
    > ProcessRecoveryConflictInterrupts() returns without throwing, and
    > another pending interrupt flag is also set.
    >
    > Thinking of other proc signals/interrupts, of course it's not really
    > acceptable that we don't process at least also global barriers here,
    > ie for a time controllable by a client, but let's call that an
    > independent problem.  I think the minimal thing for this specific
    > problem might be to use ProcessRecoveryConflictInterrupts() directly.
    
    I think that's definitely happening, my first version used a
    pg_unreachable after the CHECK_FOR_INTERRUPTS() which was triggered.
    I've replaced it by a direct call to
    ProcessRecoveryConflictInterrupts().
    
    > The read from the socket also seems to have a variant of the problem,
    > unless I'm missing something, except in that case I'm not sure it's
    > new.  For sending you're proposing that our only choice is to kill the
    > session, which makes sense, but the equivalent read side stay-in-sync
    > strategy is to keep deferring until the client gets around to sending
    > a complete message, if I read that right.
    
    I'm not sure the read path has the same issue. That would require a
    replica to do a non command read but I don't think it's possible? A
    recovering instance can't run COPY, and is there another situation
    where a replica would do a blocking read without the DoingCommandRead
    flag? Also, the read side is also killing the session since
    DoingCommandRead will be true most of the time, falling through to the
    FATAL case.
    
    > Hmm, I wonder if we could write tests for both directions in
    > 031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
    > as seen in some other recent proposals...  I might have a crack at
    > that soon unless you want to.
    
    I've tried adding a test for this. I've implemented some utility
    functions to start a session, read a payload and send a simple query
    using raw TCP. The implementation is definitely a bit rough as I'm far
    from being fluent in perl. The socket utility functions should
    probably be moved to a dedicated file and the Windows build is
    currently failing. However, the test does trigger the issue with the
    recovery blocked by the session blocked in ClientWrite.