Thread

  1. RE: archive status ".ready" files may be created too early

    Ryo Matsumura (Fujitsu) <matsumura.ryo@fujitsu.com> — 2020-05-29T06:41:40Z

    2020-03-26 18:50:24 Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
    > The v3 patch is a proof-of-concept patch that moves the ready-for-
    > archive logic to the WAL writer process.  We mark files as ready-for-
    > archive when the WAL flush pointer has advanced beyond a known WAL
    > record boundary.
    
    
    I like such a simple resolution, but I cannot agree it.
    
    1.
    This patch makes wal_writer_delay to have two meanings. For example,
    an user setting the parameter to a bigger value gets a archived file
    later.
    
    2.
    Even if we create a new parameter, we and users cannot determine the
    best value.
    
    3.
    PostgreSQL guarantees that if a database cluster stopped smartly,
    the cluster flushed and archived all WAL record as follows.
    
     [xlog.c]
      * If archiving is enabled, rotate the last XLOG file so that all the
      * remaining records are archived (postmaster wakes up the archiver
      * process one more time at the end of shutdown). The checkpoint
      * record will go to the next XLOG file and won't be archived (yet).
    
    Therefore, the idea may need that end-synchronization between WalWriter
    and archiver(pgarch).  I cannot agree it because processing for stopping
    system has complexity inherently and the syncronization makes it more 
    complicated.  Your idea gives up currency of the notifying instead of simplicity,
    but I think that the synchronization may ruin its merit.
    
    4.
    I found the patch spills a chance for notifying.  We have to be more careful.
    At the following case, WalWriter will notify after a little less than 3 times
    of wal_writer_delay in worst case.  It may not be allowed depending on value
    of wal_writer_delay. If we create a new parameter, we cannot explain to user about it.
    
    Premise:
    - Seg1 has been already notified.
    - FlushedPtr is 0/2D00000 (= all WAL record is flushed).
    
    -----
    Step 1.
    Backend-A updates InsertPtr to 0/2E00000, but does not
    copy WAL record to buffer.
    
    Step 2. (sleep)
    WalWriter memorize InsertPtr 0/2E00000 to the local variable
    (LocalInsertPtr) and sleep because FlushedPtr has not passed
    InsertPtr.
    
    Step 3.
    Backend-A copies WAL record to buffer.
    
    Step 4.
    Backend-B process updates InsertPtr to 0/3100000,
    copies their record to buffer, commits (flushes it by itself),
    and updates FlushedPtr to 0/3100000.
    
    Step 5.
    WalWriter detects that FlushedPtr(0/3100000) passes
    LocalInsertPtr(0/2E00000), but WalWriter cannot notify Seg2
    though it should be notified.
    
    It is caused by that WalWriter does not know that
    which record is crossing segment boundary.
    
    Then, after two sleeping for cheking that InsertPtr passes
    FlushedPtr again in worst case, Seg2 is notified.
    
    Step 6. (sleep)
    WalWriter sleep.
    
    Step 7.
    Backend-C inserts WAL record, flush, and updates as follows:
    InsertPtr --> 0/3200000
    FlushedPtr --> 0/3200000
    
    Step 8.
    Backend-D updates InsertPtr to 0/3300000, but does not copy
    record to buffer.
    
    Step 9. (sleep)
    WalWriter memorize InsertPtr 0/3300000 to LocalInsertPtr
    and sleep because FlushedPtr has been 0/3200000.
    
    Step 10.
    Backend-D copies its record.
    
    Step 11.
    Someone(Backend-X or WalWriter) flushes and updates FlushedPtr
    to 0/3300000.
    
    Step 12.
    WalWriter detects that FlushedPtr(0/3300000) passes
    LocalInsertPtr(0/3300000) and notify Seg2.
    -----
    
    
    I'm preparing a patch that backend inserting segment-crossboundary
    WAL record leaves its EndRecPtr and someone flushing it checks
    the EndRecPtr and notifies..
    
    
    Regards
    Ryo Matsumura