Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x
Thomas Munro <thomas.munro@gmail.com>
From: Thomas Munro <thomas.munro@gmail.com>
To: Hans Buschmann <buschmann@nidsa.net>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>,
Robert Haas <robertmhaas@gmail.com>, mithun.cy@enterprisedb.com
Date: 2019-02-20T21:41:11Z
Lists: pgsql-bugs, pgsql-hackers
Commits
Same data as JSON:
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Don't auto-restart per-database autoprewarm workers.
- fc8b39a46eb7 11.3 landed
- 1459e84cb2e5 12.0 landed
-
Fix race in dsm_attach() when handles are reused.
- 6c0fb9418925 12.0 cited
On Thu, Feb 21, 2019 at 4:36 AM Hans Buschmann <buschmann@nidsa.net> wrote: > I encountered this problem after switching the production system and then found it also on the new created replica. > > I have no knowledge of the shared memory areas involved. > > I did some further investigation and tried to reproduce it on the old System (WS2016, PG 11.2) but there it worked fine (without and with huge pages activated!). > > Even on a developer machine under WS2019, PG 11.2 the error did not occur (both cases running on different generation of intel machines, Haswell and Nehalem, under different Hypervisors, WS2012R2 and WS2019). > > I am really confused to not being able to reproduce the error outside of production and replica instances... > > The error caused a massive flood of the logs (about 800 MB in about 1 day, on SSD) > > I'll try to investigate further by configuring a second replica tomorrow, using the configuration of the production system as done per pg_basebackup. Just to confirm: on the machines where it happens, does it happen on every restart, and does it never happen if you set huge_pages = off? CC'ing the authors of the auto-prewarm feature to see if they have ideas. There is a known bug (fixed in commit 6c0fb941 for the next release) that would cause spurious dsm_attach() failure that would look just this this (dsm_attach() returns NULL), but that should be very rare and couldn't cause the behaviour described here, because here the background worker is repeatedly failing to attach in a loop (hence the 800MB of logs). -- Thomas Munro https://enterprisedb.com