RE: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

Sisson, David <david.sisson@dell.com>

From: "Sisson, David" <David.Sisson@dell.com>
To: Andres Freund <andres@anarazel.de>, Tomas Vondra <tomas.vondra@enterprisedb.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, "pgsql-bugs@lists.postgresql.org" <pgsql-bugs@lists.postgresql.org>, "Sisson, David" <David.Sisson@dell.com>
Date: 2023-01-23T19:26:09Z
Lists: pgsql-bugs
I believe something should be done with PostgreSQL because we are configuring huge_pages = off in the standard "postgresql.conf" file.
huge_pages can be turned on through outside manipulation but it can't be turned off.
Not without altering the sample config file.

Thanks,
David Angel   😊



Internal Use - Confidential

-----Original Message-----
From: Andres Freund <andres@anarazel.de> 
Sent: Saturday, January 21, 2023 8:08 PM
To: Tomas Vondra
Cc: Tom Lane; Sisson, David; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes


[EXTERNAL EMAIL] 

Hi,

On 2023-01-22 01:55:01 +0100, Tomas Vondra wrote:
> I'm not sure we'd be keen to backpatch a change of the default, but 
> maybe we would ...

After figuring out that it's clearly a configuration issue *somewhere* outside of postgres's remit, I'm not that sure it's worth doing something concretely to avoid the SIGBUS issue.


But if we end up doing something, I think a parameter triggering use of MAP_POPULATE would be a good idea. It's actually useful outside of the SIGBUS issue, because benchmarks reach a steady state noticably more quickly when using it.

OTOH, in a production scenario with large shared_buffers I'd probably not want to use it, because getting up more quickly and and distributing the memory initialization across across cores is more important.


I think it'd be ok to explicitly specify such an option in initdb - after all, initdb does do work to determine the correct shared buffers size etc, and MAP_POPULATE will lead to a more reliable determination.  Not just with huge pages, but also with "small" pages and system-level memory overcommit.

Greetings,

Andres Freund