Thread

Commits

Same data as JSON: GET /api/v1/messages/:b64id/commits the thread's linked commits as JSON, with link sources. API reference →
  1. Don't auto-restart per-database autoprewarm workers.

  2. Fix race in dsm_attach() when handles are reused.

  1. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Thomas Munro <thomas.munro@gmail.com> — 2019-02-20T02:21:05Z

    On Tue, Feb 19, 2019 at 7:31 AM PG Bug reporting form
    <noreply@postgresql.org> wrote:
    >
    > The following bug has been logged on the website:
    >
    > Bug reference:      15641
    > Logged by:          Hans Buschmann
    > Email address:      buschmann@nidsa.net
    > PostgreSQL version: 11.2
    > Operating system:   Windows Server 2019 Standard
    > Description:
    >
    > I recently moved a production system from PG 10.7 to 11.2 on a different
    > Server.
    >
    > The configuration settings where mostly taken from the old system and
    > enhanced by new features of PG 11.
    >
    > pg_prewarm was used for a long time (with no specific configuration).
    >
    > Now I have added Huge page support for Windows in the OS and verified it
    > with vmmap tool from Sysinternals to be active.
    > (the shared buffers are locked in memory: Lock_WS is set).
    >
    > When pg_prewarm.autoprewarm is set to on (using the default after initial
    > database import via pg_restore), the autoprewarm worker process
    > terminates immediately and generates a huge number of logfile entries
    > like:
    >
    > CPS PRD 2019-02-17 16:11:53 CET  00000 11:> LOG:  background worker
    > "autoprewarm worker" (PID 3996) exited with exit code 1
    > CPS PRD 2019-02-17 16:11:53 CET  55000  1:> ERROR:  could not map dynamic
    > shared memory segment
    
    Hmm.  It's not clear to me how using large pages for the main
    PostgreSQL shared memory region could have any impact on autoprewarm's
    entirely separate DSM segment.  I wonder if other DSM use cases are
    impacted.  Does parallel query work?  For example, the following
    produces a parallel query that uses a few DSM segments:
    
    create table foo as select generate_series(1, 1000000)::int i;
    analyze foo;
    explain analyze select count(*) from foo f1 join foo f2 using (i);
    
    Looking at the place where that error occurs, it seems like it simply
    failed to find the handle, as if it didn't exist at all at the time
    dsm_attach() was called.  I'm not entirely sure how that could happen
    just because you turned on huge pages.  Is it possible that there is a
    race where apw_load_buffers() manages to detach before the worker
    attached, and the timing changes?  At a glance, that shouldn't happen
    because apw_start_database_worker() waits for the work to exit before
    returning.
    
    I think we'll need one of our Windows-enabled hackers to take a look.
    
    PS Sorry for breaking the thread.  I wish our archives app had a
    "[re]send me this email" button, for people who subscribed after the
    message was sent...
    
    -- 
    Thomas Munro
    https://enterprisedb.com
    
    
    
  2. AW: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Hans Buschmann <buschmann@nidsa.net> — 2019-02-20T16:17:08Z

    Thank you for taking a look.
    
    I encountered this problem after switching the production system and then found it also on the new created replica. 
    
    I have no knowledge of the shared memory areas involved.
    
    I did some further investigation and tried to reproduce it on the old System (WS2016, PG 11.2) but there it worked fine (without and with huge pages activated!).
    
    Even on a developer machine under WS2019, PG 11.2 the error did not occur (both cases running on different generation of intel machines, Haswell and Nehalem, under different Hypervisors, WS2012R2 and WS2019).
    
    I am really confused to not being able to reproduce the error outside of production and replica instances...
    
    The error caused a massive flood of the logs (about 800 MB in about 1 day, on SSD)
    
    I'll try to investigate further by configuring a second replica tomorrow, using the configuration of the production system as done per pg_basebackup.
    
    I looked at the non-default configuration settings but could not identify anything special.
    
    Here is a current list of the production System having 4GB of memory allocated to the VM.
    (all values with XXX are a little obfuscated).
    
    Here, to avoid the error, pg_prewarm.autoprewarm is off!
    
                 name              |              current_setting               |
    -------------------------------+--------------------------------------------+
     application_name              | psql                                       |
     archive_command               | copy "xxxxxx"                              |
     archive_mode                  | on                                         |
     auto_explain.log_analyze      | off                                        |
     auto_explain.log_min_duration | -1                                         |
     client_encoding               | WIN1252                                    |
     cluster_name                  | XXX_PROD                                   |
     data_checksums                | on                                         |
     DateStyle                     | ISO, DMY                                   |
     default_text_search_config    | pg_catalog.german                          |
     dynamic_shared_memory_type    | windows                                    |
     effective_cache_size          | 8GB                                        |
     lc_collate                    | C                                          |
     lc_ctype                      | German_Germany.1252                        |
     lc_messages                   | C                                          |
     lc_monetary                   | German_Germany.1252                        |
     lc_numeric                    | German_Germany.1252                        |
     lc_time                       | German_Germany.1252                        |
     listen_addresses              | *                                          |
     log_destination               | stderr                                     |
     log_directory                 | <XXX_PATH_TO_LOG)                          |
     log_file_mode                 | 0640                                       |
     log_line_prefix               | XXX PRD %t %i %e %2l:>                     |
     log_statement                 | mod                                        |
     log_temp_files                | 0                                          |
     log_timezone                  | CET                                        |
     logging_collector             | on                                         |
     maintenance_work_mem          | 128MB                                      |
     max_connections               | 200                                        |
     max_stack_depth               | 2MB                                        |
     max_wal_size                  | 1GB                                        |
     min_wal_size                  | 80MB                                       |
     pg_prewarm.autoprewarm        | off                                        |
     pg_stat_statements.max        | 8000                                       |
     pg_stat_statements.track      | all                                        |
     random_page_cost              | 1                                          |
     search_path                   | public, archiv, ablage, admin              |
     server_encoding               | UTF8                                       |
     server_version                | 11.2                                       |
     shared_buffers                | 768MB                                      |
     shared_preload_libraries      | auto_explain,pg_stat_statements,pg_prewarm |
     temp_buffers                  | 32MB                                       |
     TimeZone                      | CET                                        |
     transaction_deferrable        | off                                        |
     transaction_isolation         | read committed                             |
     transaction_read_only         | off                                        |
     update_process_title          | off                                        |
     wal_buffers                   | 16MB                                       |
     wal_compression               | on                                         |
     wal_segment_size              | 16MB                                       |
     work_mem                      | 64MB                                       |
    (51 Zeilen)
    
    
    Thanks
    
    Hans Buschmann
    
  3. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Thomas Munro <thomas.munro@gmail.com> — 2019-02-20T21:41:11Z

    On Thu, Feb 21, 2019 at 4:36 AM Hans Buschmann <buschmann@nidsa.net> wrote:
    > I encountered this problem after switching the production system and then found it also on the new created replica.
    >
    > I have no knowledge of the shared memory areas involved.
    >
    > I did some further investigation and tried to reproduce it on the old System (WS2016, PG 11.2) but there it worked fine (without and with huge pages activated!).
    >
    > Even on a developer machine under WS2019, PG 11.2 the error did not occur (both cases running on different generation of intel machines, Haswell and Nehalem, under different Hypervisors, WS2012R2 and WS2019).
    >
    > I am really confused to not being able to reproduce the error outside of production and replica instances...
    >
    > The error caused a massive flood of the logs (about 800 MB in about 1 day, on SSD)
    >
    > I'll try to investigate further by configuring a second replica tomorrow, using the configuration of the production system as done per pg_basebackup.
    
    Just to confirm:  on the machines where it happens, does it happen on
    every restart, and does it never happen if you set huge_pages = off?
    
    CC'ing the authors of the auto-prewarm feature to see if they have ideas.
    
    There is a known bug (fixed in commit 6c0fb941 for the next release)
    that would cause spurious dsm_attach() failure that would look just
    this this (dsm_attach() returns NULL), but that should be very rare
    and couldn't cause the behaviour described here, because here the
    background worker is repeatedly failing to attach in a loop (hence the
    800MB of logs).
    
    -- 
    Thomas Munro
    https://enterprisedb.com
    
    
    
  4. AW: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Hans Buschmann <buschmann@nidsa.net> — 2019-02-21T09:21:55Z

    hello
    
    Since these are production systems, I did'nt set huge_pages=off.
    (huge pages give performance, autoprewarm is not so necessary)
    
    I think it occured on every start, but the systems were only started 1 to 2 times in this error mode.
    
    In the other cases I tried yesterday the results where very confusing (error not reproducable with or without huge pages).
    
    Hans Buschmann
    
  5. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Mithun Cy <mithun.cy@gmail.com> — 2019-02-21T12:53:29Z

    Hi Thomas, Hans,
    On Thu, Feb 21, 2019 at 2:16 PM Hans Buschmann <buschmann@nidsa.net> wrote:
    >
    > hello
    >
    > Since these are production systems, I did'nt set huge_pages=off.
    > (huge pages give performance, autoprewarm is not so necessary)
    
    I did turn autoprewarm on, windows server 2019 and postgresql 11.2 it
    runs fine even with huge_pages=on (Thanks to neha sharma). As Thomas
    said error is coming from per database worker and main worker waits
    till per data database worker exists so from code review I do see an
    issue of having an invalid handle in per database worker. A
    reproducible testcase will really help. I shall see to recheck the
    code again but I am not much hopeful without a proper testcase.
    
    --
    Thanks and Regards
    Mithun Chicklore Yogendra
    EnterpriseDB: http://www.enterprisedb.com
    
    
    
  6. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Mithun Cy <mithun.cy@gmail.com> — 2019-02-21T12:58:28Z

    On Thu, Feb 21, 2019 at 6:23 PM Mithun Cy <mithun.cy@gmail.com> wrote:
    
    > said error is coming from per database worker and main worker waits
    > till per data database worker exists so from code review I do see an
    > issue of having an invalid handle in per database worker.
    
    Sorry a typo error, I meant I do not see an issue from the code.
    
    
    
  7. AW: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Hans Buschmann <buschmann@nidsa.net> — 2019-02-24T14:04:09Z

    
    On the weekend, I did some more investigations:
    
    It seems that Huge pages are NOT the cause of this problem.
    
    The problem is only reproducable ONCE, after a database restart it disappears.
    
    By reinstalling the original pg_pasebackup on another test VM the problem reappeared once.
    
    Here is the start of  the error log:
    
    CPS PRD 2019-02-24 12:11:57 CET  00000  1:> LOG:  database system was interrupted; last known up at 2019-02-17 16:14:05 CET
    CPS PRD 2019-02-24 12:12:16 CET  00000  2:> LOG:  entering standby mode
    CPS PRD 2019-02-24 12:12:16 CET  00000  3:> LOG:  redo starts at 0/23000028
    CPS PRD 2019-02-24 12:12:16 CET  00000  4:> LOG:  consistent recovery state reached at 0/23000168
    CPS PRD 2019-02-24 12:12:16 CET  00000  5:> LOG:  invalid record length at 0/24000060: wanted 24, got 0
    CPS PRD 2019-02-24 12:12:16 CET  00000  9:> LOG:  database system is ready to accept read only connections
    CPS PRD 2019-02-24 12:12:16 CET  3D000  1:> FATAL:  database 16384 does not exist
    CPS PRD 2019-02-24 12:12:16 CET  00000 10:> LOG:  background worker "autoprewarm worker" (PID 3968) exited with exit code 1
    CPS PRD 2019-02-24 12:12:16 CET  00000  1:> LOG:  autoprewarm successfully prewarmed 0 of 12402 previously-loaded blocks
    CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.27.155", user "replicator", SSL off
    CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
    CPS PRD 2019-02-24 12:12:17 CET  00000 11:> LOG:  background worker "autoprewarm worker" (PID 3296) exited with exit code 1
    CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.27.155", user "replicator", SSL off
    CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
    CPS PRD 2019-02-24 12:12:17 CET  00000 12:> LOG:  background worker "autoprewarm worker" (PID 2756) exited with exit code 1
    CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
    ...
    (PS: the correct replication function was not set, so causing the errors concerning replication)
    
    It seems that an outdated autoprewarm.blocks causes the problem.
    
    After a restart the autoprewarm.blocks file seems to be rewritten, so that the next start gives no error.
    
    For a test, I copied the erroneus autoprewarm.blocks files over to the data section and the problem reappeared.
    
    
    The autoprewarm.blocks file is not corrupted or moved around manually but rather a leftover from the preceding test installation.
    
    On this instance I had installed a copy of the production database under 11.2.
    By doing the production switch, I dropped the test database and pg_restored the current one.
    
    This left the previous autoprewarm.blocks file in the data directory.
    
    On the first start the autoprewarm files does not match the newly restored database (perhpas the cause of the fatal error: database 16384 does not exist)
    
    So the problem lies in the initial detection of the autoprewarm.blocks file.
    
    This seems easy to reproduce:
    
    - Install/create a database with autoprewarm on and pg_prewarm loaded.
    - Fill the autoprewarm cache with some data
    - pg_dump the database
    - drop the database
    - create the database and pg_restore it from the dump
    - start the instance and logs are flooded
    
    I have taken no further investigation in the sourcecode due to limited skills so far...
    
    
    Thanks
    
    Hans Buschmann
    
    
  8. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Mithun Cy <mithun.cy@gmail.com> — 2019-02-24T18:40:49Z

    Thanks Hans, for a simple reproducible tests.
    
    On Sun, Feb 24, 2019 at 6:54 PM Hans Buschmann <buschmann@nidsa.net> wrote:
    > Here is the start of  the error log:
    >
    > CPS PRD 2019-02-24 12:11:57 CET  00000  1:> LOG:  database system was
    interrupted; last known up at 2019-02-17 16:14:05 CET
    > CPS PRD 2019-02-24 12:12:16 CET  00000  2:> LOG:  entering standby mode
    > CPS PRD 2019-02-24 12:12:16 CET  00000  3:> LOG:  redo starts at
    0/23000028
    > CPS PRD 2019-02-24 12:12:16 CET  00000  4:> LOG:  consistent recovery
    state reached at 0/23000168
    > CPS PRD 2019-02-24 12:12:16 CET  00000  5:> LOG:  invalid record length
    at 0/24000060: wanted 24, got 0
    > CPS PRD 2019-02-24 12:12:16 CET  00000  9:> LOG:  database system is
    ready to accept read only connections
    > CPS PRD 2019-02-24 12:12:16 CET  3D000  1:> FATAL:  database 16384 does
    not exist
    > CPS PRD 2019-02-24 12:12:16 CET  00000 10:> LOG:  background worker
    "autoprewarm worker" (PID 3968) exited with exit code 1
    > CPS PRD 2019-02-24 12:12:16 CET  00000  1:> LOG:  autoprewarm
    successfully prewarmed 0 of 12402 previously-loaded blocks
    > CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to
    the primary server: FATAL:  no pg_hba.conf entry for replication connection
    from host "192.168.27.155", user "replicator", SSL off
    > CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic
    shared memory segment
    
    As per the log Auto prewarm master did exit ("autoprewarm successfully
    prewarmed 0 of 12402 previously-loaded blocks") first. Then only we started
    getting "could not map dynamic shared memory segment".
    That is, master has done dsm_detach and then workers started throwing error
    after that.
    
    > This seems easy to reproduce:
    >
    > - Install/create a database with autoprewarm on and pg_prewarm loaded.
    > - Fill the autoprewarm cache with some data
    > - pg_dump the database
    > - drop the database
    > - create the database and pg_restore it from the dump
    > - start the instance and logs are flooded
    >
    > I have taken no further investigation in the sourcecode due to limited
    skills so far...
    
    I was able to reproduce same.
    
    The  "worker.bgw_restart_time" is never set for autoprewarm workers so on
    error it get restarted after some period of time (default behavior). Since
    database itself is dropped our attempt to connect to that database failed
    and then worker exited. But again got restated by postmaster then we start
    seeing above DSM segment error.
    
    I think every autoprewarm worker should be set with
    "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be
    repeated prewarm attempt of a dropped database. I will try to think further
    and submit a patch for same.
    
    -- 
    Thanks and Regards
    Mithun Chicklore Yogendra
    EnterpriseDB: http://www.enterprisedb.com
    
  9. AW: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Hans Buschmann <buschmann@nidsa.net> — 2019-02-25T10:59:48Z

    
    Glad to hear you could reproduce the case easily.
    
    I wanted to add that the problem as it seems now should'nt be restricted to Windows only.
    
    Another thing is the semantic scope of pg_prewarm:
    
    Prewarming affects the whole cluster, so at instance start we can meet some active and/or some dropped databases.
    
    To not affect the other databases the prewarming should occur on all non dropped databases and omit only the dropped ones.
    
    Hope your thinking gives a good patch... ;)
    
    Hans Buschmann
    
  10. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Mithun Cy <mithun.cy@gmail.com> — 2019-03-18T07:04:18Z

    t tOn Mon, Feb 25, 2019 at 12:10 AM Mithun Cy <mithun.cy@gmail.com> wrote:
    
    > Thanks Hans, for a simple reproducible tests.
    >
    > The  "worker.bgw_restart_time" is never set for autoprewarm workers so on
    > error it get restarted after some period of time (default behavior). Since
    > database itself is dropped our attempt to connect to that database failed
    > and then worker exited. But again got restated by postmaster then we start
    > seeing above DSM segment error.
    >
    > I think every autoprewarm worker should be set with
    > "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be
    > repeated prewarm attempt of a dropped database. I will try to think further
    > and submit a patch for same.
    >
    
    Here is the patch for same,
    
    autoprewarm waorker should not be restarted. As per the code
    @apw_start_database_worker@
    master starts a worker per database and wait until it exit by calling
    WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown
    cannot handle the case if the worker was restarted. The
    WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call
    GetBackgroundWorkerPid() if worker got restarted. So master will next
    detach the shared memory and next restarted worker keep failing going in a
    unending loop.
    
    I think there is no need to restart at all. Following are the normal error
    we might encounter.
    1. Connecting database is droped -- So we need to skip to next database
    which master will do by starting a new wroker. So not needed.
    2. Relation is droped -- try_relation_open(reloid, AccessShareLock) is used
    so error due to dropped relation is handled also avoids concurrent
    truncation.
    3. smgrexists is used before reading from a fork file. Again error is
    handled.
    4. before reading the block we have check as below. So previously truncated
    pages will not be read again.
            /* Check whether blocknum is valid and within fork file size. */
            if (blk->blocknum >= nblocks)
    
    I think if any other unexpected errors occurs that should be fatal so
    restarting will not be correcting same. Hence there is no need to restart
    the per database worker process.
    
    I tried to dig why we did not set it earlier. It used to be never restart,
    but it changed after fixing comments [1]. At that time we did not make
    explicit database connection per worker and did not handle many error cases
    as now. So it appeared fair. But, when code changed to make database
    connection per worker, we should have set every worker with
    BGW_NEVER_RESTART. Which I think was a mistake.
    
    NOTE : On zero exit status we will not restart the bgworker (see
    @CleanupBackgroundWorker@
    and @maybe_start_bgworkers@)
    [1]
    https://www.postgresql.org/message-id/CA%2BTgmoYNF_wfdwQ3z3713zKy2j0Z9C32WJdtKjvRWzeY7JOL4g%40mail.gmail.com
    -- 
    Thanks and Regards
    Mithun Chicklore Yogendra
    EnterpriseDB: http://www.enterprisedb.com
    
  11. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Robert Haas <robertmhaas@gmail.com> — 2019-03-18T15:31:26Z

    On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
    > autoprewarm waorker should not be restarted. As per the code @apw_start_database_worker@ master starts a worker per database and wait until it exit by calling WaitForBackgroundWorkerShutdown.  The call WaitForBackgroundWorkerShutdown cannot handle the case if the worker was restarted. The WaitForBackgroundWorkerShutdown() get the status BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got restarted. So master will next detach the shared memory and next restarted worker keep failing going in a unending loop.
    
    Ugh, that seems like a silly oversight.  Does it fix the reported problem?
    
    If I understand correctly, the commit message would be something like this:
    
    ==
    Don't auto-restart per-database autoprewarm workers.
    
    We should try to prewarm each database only once.  Otherwise, if
    prewarming fails for some reason, it will just keep retrying in an
    infnite loop.  The existing code was intended to implement this
    behavior, but because it neglected to set worker.bgw_restart_time, the
    per-database workers keep restarting, contrary to what was intended.
    
    Mithun Cy, per a report from Hans Buschmann
    ==
    
    Does that sound right?
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company
    
    
    
  12. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Mithun Cy <mithun.cy@gmail.com> — 2019-03-18T17:42:24Z

    Thanks Robert,
    On Mon, Mar 18, 2019 at 9:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
    
    > On Mon, Mar 18, 2019 at 3:04 AM Mithun Cy <mithun.cy@gmail.com> wrote:
    > > autoprewarm waorker should not be restarted. As per the code
    > @apw_start_database_worker@ master starts a worker per database and wait
    > until it exit by calling WaitForBackgroundWorkerShutdown.  The call
    > WaitForBackgroundWorkerShutdown cannot handle the case if the worker was
    > restarted. The WaitForBackgroundWorkerShutdown() get the status
    > BGWH_STOPPED from the call GetBackgroundWorkerPid() if worker got
    > restarted. So master will next detach the shared memory and next restarted
    > worker keep failing going in a unending loop.
    >
    > Ugh, that seems like a silly oversight.  Does it fix the reported problem?
    >
    
    -- Yes this fixes the reported issue, Hans Buschmann has given below steps
    to reproduce.
    
    > This seems easy to reproduce:
    >
    > - Install/create a database with autoprewarm on and pg_prewarm loaded.
    > - Fill the autoprewarm cache with some data
    > - pg_dump the database
    > - drop the database
    > - create the database and pg_restore it from the dump
    > - start the instance and logs are flooded
    
    -- It is explained earlier [1] that they used older autoprewarm.blocks
    which was generated before drop database. So on restrart autoprewarm worker
    failed to connect to droped database and then lead to retry loop. This
    patch should fix same.
    
    NOTE : Also, another kind of error user might see because of same bug is,
    restarted worker getting connected to next database in autoprewarm.blocks
    because autoprewarm master updated shared data "apw_state->database =
    current_db;" to start new worker for next database. Both restarted worker
    and newly created worker will connect to same database(next one) and try to
    load same pages. Hence end up with spurious log messages like  "LOG:
    autoprewarm successfully prewarmed 13 of 11 previously-loaded blocks"
    
    If I understand correctly, the commit message would be something like this:
    >
    > ==
    > Don't auto-restart per-database autoprewarm workers.
    >
    > We should try to prewarm each database only once.  Otherwise, if
    > prewarming fails for some reason, it will just keep retrying in an
    > infnite loop.  The existing code was intended to implement this
    > behavior, but because it neglected to set worker.bgw_restart_time, the
    > per-database workers keep restarting, contrary to what was intended.
    >
    > Mithun Cy, per a report from Hans Buschmann
    > ==
    >
    > Does that sound right?
    >
    
    -- Yes I Agree.
    
    [1]
    https://www.postgresql.org/message-id/D2B9F2A20670C84685EF7D183F2949E202569F21%40gigant.nidsa.net
    
    -- 
    Thanks and Regards
    Mithun Chicklore Yogendra
    EnterpriseDB: http://www.enterprisedb.com
    
  13. Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x

    Robert Haas <robertmhaas@gmail.com> — 2019-03-18T19:35:39Z

    On Mon, Mar 18, 2019 at 1:43 PM Mithun Cy <mithun.cy@gmail.com> wrote:
    >> Does that sound right?
    >
    > -- Yes I Agree.
    
    Committed with a little more tweaking of the commit message, and
    back-patched to v11.
    
    -- 
    Robert Haas
    EnterpriseDB: http://www.enterprisedb.com
    The Enterprise PostgreSQL Company