Thread
Commits
GET /api/v1/messages/:b64id/commits
the thread's linked commits as JSON, with link sources.
API reference →
-
Allow "make check"-style testing to work with musl C library.
- d82605bcd666 14.12 landed
- 8a92b70c11ba 17.0 landed
- 7651fd387697 16.3 landed
- 7124e7d528a8 12.19 landed
- 3c3f4fd741d0 15.7 landed
- 243e9953281f 13.15 landed
-
Fix compiler warnings on MSYS2
- 8c6d30f21139 13.0 cited
-
Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-16T12:38:48Z
Running the regression tests when building with musl libc fails, with errors like the following: ERROR: could not load library "<builddir>/tmp_install/usr/lib/postgresql/libpqwalreceiver.so": Error loading shared library libpq.so.5: No such file or directory (needed by <builddir>/tmp_install/usr/lib/postgresql/libpqwalreceiver.so) This was observed in Alpine Linux [1] and nixpkgs [2] a few years ago. I now looked at this a bit and this is what happens: - The temporary install location is set via LD_LIBRARY_PATH in the regression tests, so that postgres can find those libs. - All tests which load an extension / shared module via dlopen() fail, when the loaded library in turn depends on another library in tmp_install - I think in practice it's libpq.so all the time. - LD_LIBRARY_PATH is used correctly to look for the direct dependency loaded in dlopen(...), but is not taken into account anymore when trying to locate libpq.so. This step only fails with musl, but works fine with glibc. I can reproduce this with a simple Dockerfile (attached), which uses the library/postgres-alpine image, moves libpq.so to a different folder and points LD_LIBRARY_PATH at it. Build and run the dockerfile like this: docker build . -t pg-musl && docker run --rm pg-musl This Dockerfile can easily be adjusted to work with the debian image - which shows that doing the same with glibc works just fine. Even though this originated in "just" the regression tests, I'm filing this as a bug, because: - The docs explicitly mention LD_LIBRARY_PATH support to point at a different /lib folder in [3]. - This can clearly break outside the test-suite as shown with the Dockerfile. I tried a few more things: - When I add an /etc/ld-musl-$(ARCH).path file and add the path to libpq.so's libdir to it, libpq.so can be found. - When I add the path to libpq.so as an rpath to the postgres binary, libpq.so can be found. Both is not surprising, but just confirms musl-ld actually works as expected. It's just LD_LIBRARY_PATH that seems to not be passed on. To rule out a musl bug, I also put together a very simple test-case of an executable loading liba with dlopen(), which depends on libb and then constructing the same scenario with LD_LIBRARY_PATH. This works fine when compiled with glibc and musl, too. Thus, I believe the problem to be somewhere in how postgres loads those libraries. Best, Wolfgang [1]: https://github.com/alpinelinux/aports/commit/d67ceb66a1ca9e1899071c9ef09fffba29fa0417#diff-2bd25b5172fc52319de1b09086ac0db6314d2e9fa73497979f5198f8caaec1b9 [2]: https://github.com/NixOS/nixpkgs/commit/09ffd722072291f00f2a54d7404eb568a15e562a [3]: https://www.postgresql.org/docs/current/install-post.html
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-16T14:24:43Z
Wolfgang Walther <walther@technowledgy.de> writes: > - LD_LIBRARY_PATH is used correctly to look for the direct dependency > loaded in dlopen(...), but is not taken into account anymore when trying > to locate libpq.so. This step only fails with musl, but works fine with > glibc. Why do you think this is our bug and not musl's? We do not even have any code that knows anything about indirect library dependencies. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-16T15:55:49Z
Tom Lane: > Why do you think this is our bug and not musl's? Because I tried to reproduce with musl directly with a very simple example as mentioned in: > To rule out a musl bug, I also put together a very simple test-case of an executable loading liba with dlopen(), which depends on libb and then constructing the same scenario with LD_LIBRARY_PATH. This works fine when compiled with glibc and musl, too. Thus, I believe the problem to be somewhere in how postgres loads those libraries. My test case looked like the attached. To compile it with musl via Dockerfile: docker build . -t musl-dlopen && docker run --rm musl-dlopen a.c/a.h is equivalent to libpqwalreceiver and b.c/b.h to libpq. This works fine with both musl and glibc. (Note: I also tried putting liba.so and libb.so in different folders, adding both to LD_LIBRARY_PATH - but that worked fine as well.) Now my very simple example probably does something different than postgres, so that the problem doesn't appear there. But since it seems possible to do this with musl in principle, it should be possible to do it differently in postgres to make it work, too. Any ideas? Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andrew Dunstan <andrew@dunslane.net> — 2024-03-16T19:49:27Z
On Sat, Mar 16, 2024 at 11:55 AM Wolfgang Walther <walther@technowledgy.de> wrote: > Tom Lane: > > Why do you think this is our bug and not musl's? > > Because I tried to reproduce with musl directly with a very simple > example as mentioned in: > > > To rule out a musl bug, I also put together a very simple test-case of > an executable loading liba with dlopen(), which depends on libb and then > constructing the same scenario with LD_LIBRARY_PATH. This works fine when > compiled with glibc and musl, too. Thus, I believe the problem to be > somewhere in how postgres loads those libraries. > > My test case looked like the attached. To compile it with musl via > Dockerfile: > > docker build . -t musl-dlopen && docker run --rm musl-dlopen > > a.c/a.h is equivalent to libpqwalreceiver and b.c/b.h to libpq. > > This works fine with both musl and glibc. > > (Note: I also tried putting liba.so and libb.so in different folders, > adding both to LD_LIBRARY_PATH - but that worked fine as well.) > > Now my very simple example probably does something different than > postgres, so that the problem doesn't appear there. But since it seems > possible to do this with musl in principle, it should be possible to do > it differently in postgres to make it work, too. > > Any ideas? > > > On Alpine Linux, which uses musl libc, you have to run `make install` before you can run `make check`. Have you tried that? (Note to self: need a new Alpine buildfarm member) cheers andrew
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-16T20:00:28Z
Andrew Dunstan: > On Alpine Linux, which uses musl libc, you have to run `make install` > before you can run `make check`. Have you tried that? I can see how that could work around the problem, because the library would already be available in the default library path / rpath and LD_LIBRARY_PATH would not be needed. However, this would only be a workaround for the specific case of running the regression tests, not a solution. Using LD_LIBRARY_PATH, as documented, would still not be possible. In my case, I am just using docker images with Alpine to easily reproduce the problem. I am building with NixOS / nixpkgs' pkgsMusl. The order of check and install phases can't be changed there, AFAICT. The workaround I use right now is to temporarily patch rpath of the postgres binary - this will be reset during the install phase anyway. This works, but again is not a real solution. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-16T20:10:56Z
Andrew Dunstan <andrew@dunslane.net> writes: > On Alpine Linux, which uses musl libc, you have to run `make install` > before you can run `make check`. Have you tried that? We have the same situation on macOS. There, it seems to be the result of a "security feature" that strips DYLD_LIBRARY_PATH from the process environment when make executes a shell. There's not much we can do about that, and I suspect there is not much we can do about musl's behavior either. (I am not a fan of proposals to modify the binaries under test, because then you are not testing what you intend to install.) regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-16T20:19:34Z
On Sun, Mar 17, 2024 at 4:56 AM Wolfgang Walther <walther@technowledgy.de> wrote: > Any ideas? I'd look into whether there is a difference in the rules it uses for deciding not to trust LD_LIBRARY_PATH, which seems to be around here somewhere: https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1812 I wonder if you can break into an affected program and check out the magic there. FWIW on MacOS something equivalent happens at the moment we execute a shell, because the system shell is 'code signed' and that OS treats signed stuff similar to setuid binaries for this purpose (IIRC setting SHELL to point to a suitable unsigned shell could work around the problem there?) Another interesting thing that came up when I googled musl/glibc differences -- old but looks plausibly still true (not that I expect our code to be modifying that stuff in place, just something to check): https://www.openwall.com/lists/musl/2014/08/31/14
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-16T20:21:55Z
Tom Lane: > We have the same situation on macOS. There, it seems to be the result > of a "security feature" that strips DYLD_LIBRARY_PATH from the process > environment when make executes a shell. I'm not sure whether this explanation is sufficient for the musl case, because LD_LIBRARY_PATH does make a difference: The direct dependency (libpqwalreceiver.so) can still be found if it's moved elsewhere and LD_LIBRARY_PATH points at it. So clearly the LD_LIBRARY_PATH variable is still set after make executed the shell - it's just not in effect on the *indirect* dependency (libpq.so) anymore. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-16T20:31:10Z
Thomas Munro: > I'd look into whether there is a difference in the rules it uses for > deciding not to trust LD_LIBRARY_PATH, which seems to be around here > somewhere: > > https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1812 Yeah, I have been looking at that, too. I had also experimented a bit with setuid/setgid for that matter, but that didn't lead anywhere, yet. I'm not 100% sure, but I think this would also not match my other observation, that LD_LIBRARY_PATH does work for libpqwalreceiver (direct dep), but not libpq (indirect dep). > Another interesting thing that came up when I googled musl/glibc > differences -- old but looks plausibly still true (not that I expect > our code to be modifying that stuff in place, just something to > check): > > https://www.openwall.com/lists/musl/2014/08/31/14 To me, this seems very much like what could happen - it matches all my observations, so far. But I can't tell how likely that is, not knowing much of the postgres code. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-16T20:54:52Z
On Sun, Mar 17, 2024 at 9:19 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Another interesting thing that came up when I googled musl/glibc > differences -- old but looks plausibly still true (not that I expect > our code to be modifying that stuff in place, just something to > check): > > https://www.openwall.com/lists/musl/2014/08/31/14 Hmm, that does mention setproctitle, and our ps_status.c does indeed clobber some stuff in that region (in fact our ps_status.c is likely derived from the setproctitle() function from sendmail AFAICT). But that's in our "backend" server processes, unlike the problems we have on Macs... oh but you're failing to load libpqwalreceiver.so which makes some sense for the backend hypothesis. What happens if you hack ps_status.c to use PS_USE_NONE?
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-17T09:44:03Z
Thomas Munro: > Hmm, that does mention setproctitle, and our ps_status.c does indeed > clobber some stuff in that region (in fact our ps_status.c is likely > derived from the setproctitle() function from sendmail AFAICT). But > that's in our "backend" server processes, unlike the problems we have > on Macs... oh but you're failing to load libpqwalreceiver.so which > makes some sense for the backend hypothesis. What happens if you hack > ps_status.c to use PS_USE_NONE? Nailed it. PS_USE_NONE fixes it. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Christophe Pettus <xof@thebuild.com> — 2024-03-17T10:33:52Z
> On Mar 17, 2024, at 02:44, Wolfgang Walther <walther@technowledgy.de> wrote: > > Nailed it. PS_USE_NONE fixes it. Given the musl (still?) does not define a preprocessor macro specific to it, is there a way of improving the test in pg_status.c to catch this case? It seems wrong that the current test passes a case that doesn't actually work.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-17T13:11:19Z
Christophe Pettus: > Given the musl (still?) does not define a preprocessor macro specific to it, is there a way of improving the test in pg_status.c to catch this case? It seems wrong that the current test passes a case that doesn't actually work. The missing macro is on purpose and unlikely to change: https://openwall.com/lists/musl/2013/03/29/13 I also found this thread, which discusses exactly our case: https://www.openwall.com/lists/musl/2022/08/17/1 Some quotes from that thread: > I understand that what Postgres et al are doing is a nasty hack. And: > Applications that *really* want setproctitle type functionality can > presumably do something like re-exec themselves with a suitably large > argv[0] to give them safe space to overwrite with their preferred > message, rather than UB trying to relocate the environment (and auxv? > how? they can't tell libc they moved it) to some other location. Could that be a more portable way of doing this? Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Christophe Pettus <xof@thebuild.com> — 2024-03-17T15:44:45Z
> On Mar 17, 2024, at 06:11, Wolfgang Walther <walther@technowledgy.de> wrote: > The missing macro is on purpose and unlikely to change: https://openwall.com/lists/musl/2013/03/29/13 Indeed. > I also found this thread, which discusses exactly our case: https://www.openwall.com/lists/musl/2022/08/17/1 While getting proper setproctitle functionality on musl would be great, my goal was more modest: Have it correctly set PS_USE_NONE when compiling against musl.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andrew Dunstan <andrew@dunslane.net> — 2024-03-17T20:33:40Z
On Sun, Mar 17, 2024 at 11:45 AM Christophe Pettus <xof@thebuild.com> wrote: > > > > On Mar 17, 2024, at 06:11, Wolfgang Walther <walther@technowledgy.de> > wrote: > > The missing macro is on purpose and unlikely to change: > https://openwall.com/lists/musl/2013/03/29/13 > > Indeed. > That seems a little shortsighted. If other libc implementations find it appropriate to have similar macros why should they be different? > > I also found this thread, which discusses exactly our case: > https://www.openwall.com/lists/musl/2022/08/17/1 > > While getting proper setproctitle functionality on musl would be great, my > goal was more modest: Have it correctly set PS_USE_NONE when compiling > against musl. > One simple thing might be for us to enclose the block in ps_status.c at lines 49-59 in #ifndef PS_USE_NONE/#endif. Then you could compile with -DPS_USE_NONE in your CPPFLAGS. cheers andrew
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Christophe Pettus <xof@thebuild.com> — 2024-03-17T21:05:54Z
> On Mar 17, 2024, at 13:33, Andrew Dunstan <andrew@dunslane.net> wrote: > > That seems a little shortsighted. If other libc implementations find it appropriate to have similar macros why should they be different? It's a philosophical argument against checking for particular libc implementations instead of particular features. I'm not unsympathetic to that argument, but AFAICT there's no clean way of checking for this by examining feature #defines.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-17T23:20:26Z
On Mon, Mar 18, 2024 at 10:06 AM Christophe Pettus <xof@thebuild.com> wrote: > > On Mar 17, 2024, at 13:33, Andrew Dunstan <andrew@dunslane.net> wrote: > > That seems a little shortsighted. If other libc implementations find it appropriate to have similar macros why should they be different? > > It's a philosophical argument against checking for particular libc implementations instead of particular features. I'm not unsympathetic to that argument, but AFAICT there's no clean way of checking for this by examining feature #defines. I like their philosophy, and I like their FAQ. Down with software monocultures, up with standards and cooperation. But anyway... I wondered for a moment if there could be a runtime way to test if we'd broken stuff, but it seems hard without a way to ask the runtime linker for its search path to see if it has any pointers into the environment. We can't, that "env_path" variable in dynlink.c is not accessible to us by any reasonable means. And yeah, this whole thing is a nasty invasive hack that harks back to the 80s I assume, before many systems provided a clean way to do this (and some never did)... Hmm, I can think of one dirty hack on top of our existing dirty hack that might work. I feel bad typing this out, but here goes nothing: In save_ps_display_args(), we compute end_of_area, stepping past contiguous arguments and environment variables. But what if we terminated that if we saw an environment entry beginning "LD_"? We'd still feel free to clobber the memory up to that point (rather than limiting ourselves to the argv space, another more conservative choice that might truncate a few PS display messages, or maybe not given the typical postmaster arguments, maye that'd work out OK), and we'd still copy the environment to somewhere new, but anything like "LD_XXX" that the runtime linker might have stashed a pointer to would remain valid. /me runs away and hides
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Christophe Pettus <xof@thebuild.com> — 2024-03-18T09:33:45Z
> On Mar 17, 2024, at 16:20, Thomas Munro <thomas.munro@gmail.com> wrote: > > We'd > still feel free to clobber the memory up to that point (rather than > limiting ourselves to the argv space, another more conservative choice > that might truncate a few PS display messages, or maybe not given the > typical postmaster arguments, maye that'd work out OK), and we'd still > copy the environment to somewhere new, but anything like "LD_XXX" that > the runtime linker might have stashed a pointer to would remain valid. > /me runs away and hides It doesn't lack for bravery! (And I have to just comment that the linker storing pointers into that space as a way of finding libraries... well, that doesn't get them the moral high ground for nasty hacks.) I'm comfortable with "if you are using musl, you don't get the ps messages" as a first solution, if we can find a way of detecting a libc that passes the other tests but doesn't support any of the existing hacks.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-18T13:25:33Z
On Mon, Mar 18, 2024 at 10:34 PM Christophe Pettus <xof@thebuild.com> wrote: > > On Mar 17, 2024, at 16:20, Thomas Munro <thomas.munro@gmail.com> wrote: > > We'd > > still feel free to clobber the memory up to that point (rather than > > limiting ourselves to the argv space, another more conservative choice > > that might truncate a few PS display messages, or maybe not given the > > typical postmaster arguments, maye that'd work out OK), and we'd still > > copy the environment to somewhere new, but anything like "LD_XXX" that > > the runtime linker might have stashed a pointer to would remain valid. > > /me runs away and hides > > It doesn't lack for bravery! (And I have to just comment that the linker storing pointers into that space as a way of finding libraries... well, that doesn't get them the moral high ground for nasty hacks.) FWIW here is a blind patch if someone wants to try it out... no musl here. (Hmm, I think it's not that unreasonable on their part to assume the initial environment is immutable if their implementation doesn't mutate it, and our doing so is undeniably UB; surprising, maybe, given that the technique works on that other popular brand of C library on that kind of kernel, not to mention dozens of old Unixen of yore... The real solution may turn out to be the prctl() described in that thread, where you can tell the kernel where you're planning to move your argv and it can find it to show ps/top, but I checked and you still can't call that without special privileges, so maybe someone should get onto complaining to kernel hackers about that? That thread is wrong about us clobbering auxv BTW, we're not animals!)
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-18T14:23:32Z
Thomas Munro <thomas.munro@gmail.com> writes: > (Hmm, I think it's not that unreasonable on their part to assume the > initial environment is immutable if their implementation doesn't > mutate it, and our doing so is undeniably UB; surprising, maybe, given > that the technique works on that other popular brand of C library on > that kind of kernel, not to mention dozens of old Unixen of yore... Does their implementation also ignore the effects of putenv() or setenv() on LD_LIBRARY_PATH? They have no moral high ground whatsoever if that's the case. But if it doesn't, an alternative route to a solution could be to scan the original environment, strdup and putenv each entry to move it to freshly malloc'd space, and then reclaim the old environment area. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-18T21:17:36Z
On Tue, Mar 19, 2024 at 3:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > (Hmm, I think it's not that unreasonable on their part to assume the > > initial environment is immutable if their implementation doesn't > > mutate it, and our doing so is undeniably UB; surprising, maybe, given > > that the technique works on that other popular brand of C library on > > that kind of kernel, not to mention dozens of old Unixen of yore... > > Does their implementation also ignore the effects of putenv() or > setenv() on LD_LIBRARY_PATH? They have no moral high ground > whatsoever if that's the case. But if it doesn't, an alternative > route to a solution could be to scan the original environment, strdup > and putenv each entry to move it to freshly malloc'd space, and > then reclaim the old environment area. Yes, the musl linker/loader ignores putenv()/setenv() changes to LD_LIBRARY_PATH after process start (that is, changes only effect the search path when injected into a new program with exec*()). As does glibc, it's just that it captures by copy instead of reference (according to one of the links above, I didn't check the source). So setenv() has no effect on dlopen() in *this* program, and using putenv in that way won't help. We simply can't move the value of LD_LIBRARY_PATH (though my patch could be a little sneakier and steal all the bytes right up to the = sign to get more space for our message!). One way to tell if a copy has been made is to trace a program that does: getenv("LD_LIBRARY_PATH")[2] = 'X'; dlopen("foo.so", RTLD_NOW | RTLD_GLOBAL); ... when run with LD_LIBRARY_PATH set to /asdf. On FreeBSD I see it tries to open "/aXdf...", so now I know that FreeBSD also captures it by reference like musl. But we don't use the clobber trick on FreeBSD, it has a proper setproctitle() function that knows how to negotiate with the kernel, so it doesn't matter. It also ignores changes made with setent()/putenv(), because those create fresh entries but leave the initial environment strings untouched. Solaris also ignores changes made after startup (it's in the dlopen man page), and from a very quick look at its ld_lib_setup() I think it achieved that with a copy. I believe its ancestor SunOS 4 invented all of these conventions (and the mmap/virtual memory concepts they rode in on), later nailed down to some degree in the System V ABI and very widely adopted, but I don't see anything in the latter that specifically addresses this point, eg LD_LIBRARY copy vs reference and interaction with dlopen() (perhaps I didn't look hard enough). I'm not sure what else you can point to to make strong claims about this stuff, but I bet every system ignores changes after startup, it's just that they found two ways to achieve that. POSIX says of dlopen that the "file [argument] is used in an implementation-defined manner", and of environ that we're welcome to swap a whole new environ, but doesn't seem to tell us anything about the one that is replaced (who owns it? is the initial one set up at execution time special? etc). The line banning manipulation of the pointers environ refers to doesn't exactly describe what we're doing (we're manipulating the strings pointed to by the *previous* environ). UB. -
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-18T22:48:50Z
On Tue, Mar 19, 2024 at 10:17 AM Thomas Munro <thomas.munro@gmail.com> wrote: > ... (though my patch could be a little sneakier and steal > all the bytes right up to the = sign to get more space for our > message!). Here's one like that. No musl here -- does this work Wolfgang? Do we think it's generous enough with space in practice that we could just always do this for __linux__ systems without anyone noticing (ie including glibc users)? Should we be more specific about which LD_* variables? Do people not doing hacking/testing ever really set those, eg on production servers? This code path was once used by up to a dozen or so OSes but they're all dead, only Linux, Solaris and macOS left, and I don't have any reason to think they suffer from this problem and Macs don't even follow the SysV LD_ naming convention, hence gating on Linux.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-19T23:54:49Z
On Tue, Mar 19, 2024 at 11:48:50AM +1300, Thomas Munro wrote: > On Tue, Mar 19, 2024 at 10:17 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > ... (though my patch could be a little sneakier and steal > > all the bytes right up to the = sign to get more space for our > > message!). > > Here's one like that. No musl here -- does this work Wolfgang? Do we > think it's generous enough with space in practice that we could just > always do this for __linux__ systems without anyone noticing (ie > including glibc users)? Should we be more specific about which LD_* > variables? Do people not doing hacking/testing ever really set those, > eg on production servers? This code path was once used by up to a > dozen or so OSes but they're all dead, only Linux, Solaris and macOS > left, and I don't have any reason to think they suffer from this > problem and Macs don't even follow the SysV LD_ naming convention, > hence gating on Linux. So this would truncate the process title on all Linux that have an LD_ environment entry, even those without musl? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-20T01:12:54Z
On Wed, Mar 20, 2024 at 12:54 PM Bruce Momjian <bruce@momjian.us> wrote: > So this would truncate the process title on all Linux that have an LD_ > environment entry, even those without musl? Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a postmaster near you? You'd always get that much, plus as much of /proc/XXX/environ as we can find before you reach LD_XXX=, which on a typical system would, I guess, usually be never. If it's a problem you could try to arrange for LD_ XXX to come later in environ[]. What I observe is that they seem to get copied in backwards, wrt the environment exported by the parent, so if you set DUMMY=XXXXXXXX just before starting the process it'll make sacrificial space in the right place (but I'm not sure where that effect is coming from so don't quote me).
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-20T01:27:23Z
On Wed, Mar 20, 2024 at 02:12:54PM +1300, Thomas Munro wrote: > On Wed, Mar 20, 2024 at 12:54 PM Bruce Momjian <bruce@momjian.us> wrote: > > So this would truncate the process title on all Linux that have an LD_ > > environment entry, even those without musl? > > Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a > postmaster near you? You'd always get that much, plus as much of $ cat /proc/2000/cmdline |wc -c 30 > /proc/XXX/environ as we can find before you reach LD_XXX=, which on a > typical system would, I guess, usually be never. If it's a problem > you could try to arrange for LD_ XXX to come later in environ[]. What > I observe is that they seem to get copied in backwards, wrt the > environment exported by the parent, so if you set DUMMY=XXXXXXXX just > before starting the process it'll make sacrificial space in the right > place (but I'm not sure where that effect is coming from so don't > quote me). I am just cautious about changing behavior on our most common platform for a libc library I have never heard of. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-20T01:53:08Z
Thomas Munro <thomas.munro@gmail.com> writes: > Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a > postmaster near you? You'd always get that much, plus as much of > /proc/XXX/environ as we can find before you reach LD_XXX=, which on a > typical system would, I guess, usually be never. I'd be happier about this if the target pattern were more restrictive. Is there reason to think that musl keeps a pointer to anything besides LD_LIBRARY_PATH? regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-20T02:01:09Z
On Wed, Mar 20, 2024 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a > > postmaster near you? You'd always get that much, plus as much of > > /proc/XXX/environ as we can find before you reach LD_XXX=, which on a > > typical system would, I guess, usually be never. > > I'd be happier about this if the target pattern were more restrictive. > Is there reason to think that musl keeps a pointer to anything besides > LD_LIBRARY_PATH? Also LD_PRELOAD: https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1824 Yeah we could do just those two.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-20T02:03:36Z
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Mar 20, 2024 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'd be happier about this if the target pattern were more restrictive. >> Is there reason to think that musl keeps a pointer to anything besides >> LD_LIBRARY_PATH? > Also LD_PRELOAD: > https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1824 > Yeah we could do just those two. +1 for stopping only at one of those two names. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-20T02:12:49Z
On Wed, Mar 20, 2024 at 2:27 PM Bruce Momjian <bruce@momjian.us> wrote: > I am just cautious about changing behavior on our most common platform > for a libc library I have never heard of. Yeah, I hear you. I don't have a dog in this race, I just like retro-computing mysteries and arguments about the meaning of standards... That said I'm pretty sure no one should be running production PostgreSQL systems held together by LD_LIBRARY_PATH, and if they do, it looks like systemd/rc.d scripts set a bunch of PG_OOM_BLAH stuff just before the start the cluster that would provide extra chaff in front of LD_XXX stuff defined earlier, and then pg_ctl inserts even more, and even if they don't use any of that stuff and just run the postmaster directly with some other homegrown tooling, now we're down to very niche/expert scenarios where it should be acceptable to point to this thread that says "try setting an extra dummy variable after you set LD_XXX!".
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-20T04:39:34Z
On Wed, Mar 20, 2024 at 3:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for stopping only at one of those two names. Here's one like that for Wolfgang to test on musl.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Peter Eisentraut <peter@eisentraut.org> — 2024-03-20T06:02:48Z
On 17.03.24 11:33, Christophe Pettus wrote: >> On Mar 17, 2024, at 02:44, Wolfgang Walther <walther@technowledgy.de> wrote: >> >> Nailed it. PS_USE_NONE fixes it. > > Given the musl (still?) does not define a preprocessor macro specific to it, is there a way of improving the test in pg_status.c to catch this case? It seems wrong that the current test passes a case that doesn't actually work. > We could turn it around and do #if defined(__linux__) #if defined(__GLIBC__) || defined(__UCLIBC__ ) #define PS_USE_CLOBBER_ARGV #else #define PS_USE_NONE #endif #endif
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andrew Dunstan <andrew@dunslane.net> — 2024-03-20T07:16:14Z
On Wed, Mar 20, 2024 at 2:03 AM Peter Eisentraut <peter@eisentraut.org> wrote: > On 17.03.24 11:33, Christophe Pettus wrote: > >> On Mar 17, 2024, at 02:44, Wolfgang Walther <walther@technowledgy.de> > wrote: > >> > >> Nailed it. PS_USE_NONE fixes it. > > > > Given the musl (still?) does not define a preprocessor macro specific to > it, is there a way of improving the test in pg_status.c to catch this > case? It seems wrong that the current test passes a case that doesn't > actually work. > > > > We could turn it around and do > > #if defined(__linux__) > #if defined(__GLIBC__) || defined(__UCLIBC__ ) > #define PS_USE_CLOBBER_ARGV > #else > #define PS_USE_NONE > #endif > #endif > > > > I like it. Neat and minimal. cheers andrew
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-20T09:39:20Z
Thomas Munro: > On Wed, Mar 20, 2024 at 3:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> +1 for stopping only at one of those two names. > > Here's one like that for Wolfgang to test on musl. Works fine. Peter Eisentraut: > We could turn it around and do > > #if defined(__linux__) > #if defined(__GLIBC__) || defined(__UCLIBC__ ) > #define PS_USE_CLOBBER_ARGV > #else > #define PS_USE_NONE > #endif > #endif This works as well. I also put together a PoC of what was mentioned in musl's mailing list: Instead of clobbering environ at all, exec yourself again with padded argv0. This works, too. Attached. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-20T13:35:51Z
On Wed, Mar 20, 2024 at 10:39:20AM +0100, Wolfgang Walther wrote: > Peter Eisentraut: > > We could turn it around and do > > > > #if defined(__linux__) > > #if defined(__GLIBC__) || defined(__UCLIBC__ ) > > #define PS_USE_CLOBBER_ARGV > > #else > > #define PS_USE_NONE > > #endif > > #endif > > This works as well. Yes, I prefer this. I am worried the environ hackery will bite us someday and the cause will be hard to find. > I also put together a PoC of what was mentioned in musl's mailing list: > Instead of clobbering environ at all, exec yourself again with padded argv0. > This works, too. Attached. It is hard to imagine why we would add an extra exec on every Linux server start for this. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-20T13:37:30Z
On Wed, Mar 20, 2024 at 09:35:51AM -0400, Bruce Momjian wrote: > On Wed, Mar 20, 2024 at 10:39:20AM +0100, Wolfgang Walther wrote: > > I also put together a PoC of what was mentioned in musl's mailing list: > > Instead of clobbering environ at all, exec yourself again with padded argv0. > > This works, too. Attached. > > It is hard to imagine why we would add an extra exec on every Linux > server start for this. I guess we could conditionally exec only if we find we must, but then such exec cases would be rare and rarely tested. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-20T14:24:58Z
Bruce Momjian: > I guess we could conditionally exec only if we find we must, but then > such exec cases would be rare and rarely tested. I think you might be seriously underestimating how often musl is used. Alpine Linux uses musl and is very widespread in the container world because of smaller image size. The library/postgres docker image has been pulled about 8 billion times since 2014 [1]. While we can't really tell how many of those pulled the alpine variant of the image, comparing the alpine [2] and ubuntu/debian [3,4] base images gives a rough estimate of >50% using alpine in general. This is certainly not rare. But yeah, buildfarm coverage for musl would be good, I agree. Maybe even directly in CI? Best, Wolfgang [1]: https://hub.docker.com/v2/repositories/library/postgres [2]: https://hub.docker.com/v2/repositories/library/alpine [3]: https://hub.docker.com/v2/repositories/library/ubuntu [4]: https://hub.docker.com/v2/repositories/library/debian
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Laurenz Albe <laurenz.albe@cybertec.at> — 2024-03-20T14:28:53Z
On Wed, 2024-03-20 at 15:24 +0100, Wolfgang Walther wrote: > I think you might be seriously underestimating how often musl is used. > Alpine Linux uses musl and is very widespread in the container world > because of smaller image size The last time I looked, its collation support didn't work at all... Yours, Laurenz Albe
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-20T14:36:38Z
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Wed, 2024-03-20 at 15:24 +0100, Wolfgang Walther wrote: >> I think you might be seriously underestimating how often musl is used. >> Alpine Linux uses musl and is very widespread in the container world >> because of smaller image size > The last time I looked, its collation support didn't work at all... I think the same is true of some of the BSDen, so that's not a large impediment to us. But in any case, if somebody wants Alpine or musl to be considered a supported platform, they'd best step up and run a buildfarm animal. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-20T14:37:07Z
Bruce Momjian: > On Wed, Mar 20, 2024 at 10:39:20AM +0100, Wolfgang Walther wrote: >> Peter Eisentraut: >>> We could turn it around and do >>> >>> #if defined(__linux__) >>> #if defined(__GLIBC__) || defined(__UCLIBC__ ) >>> #define PS_USE_CLOBBER_ARGV >>> #else >>> #define PS_USE_NONE >>> #endif >>> #endif >> >> This works as well. > > Yes, I prefer this. I am worried the environ hackery will bite us > someday and the cause will be hard to find. Well, the environ hackery already bit and it sure was hard to find. But this approach would still clobber environ happily... which is undefined behavior. But certainly the opt-in to known-to-be-good libc variants is a better approach than before. Between this and "stop clobbering at LD_LIBRARY_PATH", I prefer the latter, though. >> I also put together a PoC of what was mentioned in musl's mailing list: >> Instead of clobbering environ at all, exec yourself again with padded argv0. >> This works, too. Attached. > > It is hard to imagine why we would add an extra exec on every Linux > server start for this. Would this be a problem? For a running server this would happen only once when the postmaster starts up, AFAICT. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-20T14:43:05Z
Laurenz Albe: > On Wed, 2024-03-20 at 15:24 +0100, Wolfgang Walther wrote: >> I think you might be seriously underestimating how often musl is used. >> Alpine Linux uses musl and is very widespread in the container world >> because of smaller image size > > The last time I looked, its collation support didn't work at all... IIUC, using icu collations should work. I didn't extensively try, though. But yeah - musl itself doesn't do it, knowingly so. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-20T14:46:39Z
Tom Lane: > But in any case, if somebody wants > Alpine or musl to be considered a supported platform, they'd > best step up and run a buildfarm animal. Yeah, I was already thinking about that. But I guess we'd need to first make the test suite pass on musl. i.e. $subject, but there are also some smaller issues after that, before the full test suite will pass. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Peter Eisentraut <peter@eisentraut.org> — 2024-03-20T18:37:08Z
On 20.03.24 15:37, Wolfgang Walther wrote: >> It is hard to imagine why we would add an extra exec on every Linux >> server start for this. > > Would this be a problem? For a running server this would happen only > once when the postmaster starts up, AFAICT. I wonder if it would cause issues with systemd or similar, if the PID of the running process is not the one that systemd started. If so, there is probably a workaround, but it would have to be analyzed.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-20T19:10:01Z
Peter Eisentraut: >> Would this be a problem? For a running server this would happen only >> once when the postmaster starts up, AFAICT. > > I wonder if it would cause issues with systemd or similar, if the PID of > the running process is not the one that systemd started. If so, there > is probably a workaround, but it would have to be analyzed. I don't think that exec even creates a new PID. The current process is replaced, so the PID stays the same. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-20T19:18:39Z
On Wed, Mar 20, 2024 at 03:24:58PM +0100, Wolfgang Walther wrote: > Bruce Momjian: > > I guess we could conditionally exec only if we find we must, but then > > such exec cases would be rare and rarely tested. > > I think you might be seriously underestimating how often musl is used. > Alpine Linux uses musl and is very widespread in the container world because > of smaller image size. > > The library/postgres docker image has been pulled about 8 billion times > since 2014 [1]. While we can't really tell how many of those pulled the > alpine variant of the image, comparing the alpine [2] and ubuntu/debian > [3,4] base images gives a rough estimate of >50% using alpine in general. Uh, what is the current behavior of Postgres on musl? It just fails if the process title is longer than argv[0] plus the environment space to the LD_ environment variable, and then linking fails for certain extensions? If there are many downloads, why would we only be getting this report now? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-20T19:29:21Z
Bruce Momjian: >> The library/postgres docker image has been pulled about 8 billion times >> since 2014 [1]. While we can't really tell how many of those pulled the >> alpine variant of the image, comparing the alpine [2] and ubuntu/debian >> [3,4] base images gives a rough estimate of >50% using alpine in general. > > Uh, what is the current behavior of Postgres on musl? It just fails if > the process title is longer than argv[0] plus the environment space to > the LD_ environment variable, and then linking fails for certain > extensions? If there are many downloads, why would we only be getting > this report now? The process title works fine. It's just the way how space is cleared for the process title, that is causing problems elsewhere. The thing that is broken when running postgres on alpine/musl is, to put libpq in a custom path and use LD_LIBRARY_PATH to find it when loading libpqwalreceiver (+ some contrib modules). Nobody does that, especially not in a container environment where postgres is likely the only thing running in that container, so there is no point in using any custom library paths or anything - the image is built once and made to work, and everybody else is just using that working image. The much more practical problem is that the test suite doesn't run, because it makes use of LD_LIBRARY_PATH for that purpose. In the past, the packagers for alpine only disabled the failing tests, but IIRC they have given up on that and just disabled the whole test suite by now. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-20T19:42:16Z
On Wed, Mar 20, 2024 at 08:29:21PM +0100, Wolfgang Walther wrote: > Bruce Momjian: > > > The library/postgres docker image has been pulled about 8 billion times > > > since 2014 [1]. While we can't really tell how many of those pulled the > > > alpine variant of the image, comparing the alpine [2] and ubuntu/debian > > > [3,4] base images gives a rough estimate of >50% using alpine in general. > > > > Uh, what is the current behavior of Postgres on musl? It just fails if > > the process title is longer than argv[0] plus the environment space to > > the LD_ environment variable, and then linking fails for certain > > extensions? If there are many downloads, why would we only be getting > > this report now? > > The process title works fine. It's just the way how space is cleared for the > process title, that is causing problems elsewhere. > > The thing that is broken when running postgres on alpine/musl is, to put > libpq in a custom path and use LD_LIBRARY_PATH to find it when loading > libpqwalreceiver (+ some contrib modules). Nobody does that, especially not > in a container environment where postgres is likely the only thing running > in that container, so there is no point in using any custom library paths or > anything - the image is built once and made to work, and everybody else is > just using that working image. > > The much more practical problem is that the test suite doesn't run, because > it makes use of LD_LIBRARY_PATH for that purpose. In the past, the packagers > for alpine only disabled the failing tests, but IIRC they have given up on > that and just disabled the whole test suite by now. Thanks, that is very clear. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-20T22:26:26Z
On Thu, Mar 21, 2024 at 2:35 AM Bruce Momjian <bruce@momjian.us> wrote: > Yes, I prefer this. I am worried the environ hackery will bite us > someday and the cause will be hard to find. Some speculation on how widespread this trick is: as mentioned, it seems to come from sendmail (I didn't spend the time to find a repo that has ancient versions, so this is someone's random snapshot repo, but it's referring to pretty old dead systems): https://github.com/Distrotech/sendmail/blob/master/sendmail/conf.c#L2436 That was once ubiquitous, back in the day. One of the most widespread envon-clobberers these days must be openssh: https://github.com/openssh/openssh-portable/blob/86bdd3853f4d32c85e295e6216a2fe0953ad93f0/openbsd-compat/setproctitle.c#L69 And funnily enough, googling LD_LIBRARY_PATH and openssh brings up a few unsolved/unanswered questions about mysterious breakage (though I didn't see any that mentioned musl by name and there could be other explanations, *shrug*). There is also Chromium/Chrome: https://github.com/chromium/chromium/blob/main/base/process/set_process_title_linux.cc#L136 That code has some interesting commentary and points to a commit in Linux which mentions setproctitle() and making sure it still works (funny because setproctitle() isn't a function in any standard userspace library AFAIK so I guess it just means the trick in general), and also mentions the failure of attempts to get an official way to do this negotiated between the relevant projects. Of course we have to distinguish between the basic argv[] clobbering trick which is barely even a trick, and the more advanced environ stealing trick which confuses musl. A very widespread user of the basic trick would be systemd, which tries to use the prctl() if it can to get a much bigger window of memory to write on, but otherwise falls back to accepting a small one. I guess we'd do the same if we could, ie if a future Linux version didn't require CAP_SYS_RESOURCES to do it: https://github.com/systemd/systemd/blob/8810b782a17050d7f7a870b975f09e8a690b7bea/src/basic/argv-util.c Anyway, it looks like there is plenty of will out there to keep this working, albeit in a weird semi-supported state whose cruftiness is undeniable.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-21T20:16:46Z
Thomas Munro: > Of course we have to distinguish between the basic argv[] clobbering > trick which is barely even a trick, and the more advanced environ > stealing trick which confuses musl. Right. The latter not only confuses musl, but also makes /proc/<pid>/environ return garbage. This is also mentioned at the bottom of main.c, which has a workaround for the specific case of UBSan depending on that. This is kind of funny: Because we are relying on undefined behavior regarding the modification of environ, we need a workaround for the "UndefinedBehaviorSanitizer" - I guess by failing without this workaround, it wanted to tell us something.. This happens on glibc, too. So summarizing: 1. The simple approach is to use PS_USE_CLOBBER_ARGV on Linux only for glibc and other known-to-be-good-and-identifiable libc variants, otherwise default to PS_USE_NONE. This will not only keep the problem for /proc/../environ for glibc users, but also disable ps status for musl entirely. Considering that probably the biggest use-case for musl is to run postgres in containers, it's quite likely to actually run more than just one cluster on a single machine. In this case... ps status would be especially handy to identify which cluster a process belongs to. 2. The next proposal was to stop clobbering environ once LD_LIBRARY_PATH / LD_PRELOAD is found to keep those intact. This will keep ps status support on musl, which is good. But the /proc/.../environ problem will still be there, unchanged. Both of those approaches rely on the undefined behavior of clobbering environ. 3. The logical consequence of this is, to stop clobbering environ and use only the available argv space. However, this will quickly leave us with a very small ps status buffer to work with, making the feature less useful. Note, that this could happen theoretically by starting postgres with the fewest arguments and environment possible, too. Not sure what the minimal buffer size is that could be achieved that way. The point is: The buffer size is not guaranteed at all. 4. The upstream (musl) suggestion of which I sent a PoC was to "exec yourself with a bigger argv". This works. I chose to pad argv0 with trailing slashes. Those can safely be stripped away again, because any argv0 which would come with a trailing slash to start with, would not be the current executable, but a directory - so would fail exec immediately anyway. This keeps /proc/.../environ intact and does not rely on undefined behavior. Additionally, we get a guaranteed ps buffer size of 256, which is what we use on BSDs and Windows, too. I wonder why we actually fall back to PS_USE_NONE by default.. and how much of that is related to the environment clobbering to start with? Could we even use the exec-approach as the fallback in all other cases except BSDs and Windows and get rid of PS_USE_NONE? Clobbering only argv sure seems way safer to do than what we do right now. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-21T20:30:00Z
Here is what we could with this: > 2. The next proposal was to stop clobbering environ once LD_LIBRARY_PATH > / LD_PRELOAD is found to keep those intact. We could backpatch this down to v12. This would be one step to make the test suite pass on Alpine Linux with musl and ultimately allow setting up a buildfarm animal for that. It does not solve the /proc/.../environ problem, but at least keeps ps status working on musl as it did before, so not a regression. > 4. The upstream (musl) suggestion of which I sent a PoC was to "exec > yourself with a bigger argv". We could do this in HEAD now ... > Could we even use the exec-approach as the fallback in all other cases > except BSDs and Windows and get rid of PS_USE_NONE? ... and then remove PS_USE_NONE at the beginning of the v18 cycle. This would give a bit more time for those "other systems", which were previously falling back PS_USE_NONE and would then clobber argv, too. Opinions? Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-21T22:42:52Z
On Fri, Mar 22, 2024 at 9:30 AM <walther@technowledgy.de> wrote: > > 4. The upstream (musl) suggestion of which I sent a PoC was to "exec > > yourself with a bigger argv". > > We could do this in HEAD now ... Just a thought: if we want to go this way, do we need a new exec call? We already control the initial exec in pg_ctl.c. > > Could we even use the exec-approach as the fallback in all other cases > > except BSDs and Windows and get rid of PS_USE_NONE? > > ... and then remove PS_USE_NONE at the beginning of the v18 cycle. > > This would give a bit more time for those "other systems", which were > previously falling back PS_USE_NONE and would then clobber argv, too. RIght. It's unspecified by POSIX whether ps shows changes to those strings (and there are systems that don't), but it can't hurt to do so anyway, and it'd be better than having a PS_USE_NONE code path that is untested. I dimly recall that it turned out that PS_USE_NONE was actually broken for a while without anyone noticing.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-21T23:02:47Z
Thomas Munro <thomas.munro@gmail.com> writes: > Just a thought: if we want to go this way, do we need a new exec call? > We already control the initial exec in pg_ctl.c. I'm resistant to assuming the postmaster is launched through pg_ctl. systemd, for example, might well prefer not to do that, not to mention all the troglodytes still using 1990s launch scripts. A question that seems worth debating in this thread is how much updating the process title is even worth nowadays. It feels like a hangover from before we had pg_stat_activity and other monitoring support. So I don't feel a huge need to support it on musl. The previously-suggested patch to whitelist glibc and variants, and otherwise fall back to PS_USE_NONE, seems like it might be the appropriate amount of effort. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-21T23:20:11Z
On Fri, Mar 22, 2024 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The previously-suggested patch to whitelist glibc and variants, > and otherwise fall back to PS_USE_NONE, seems like it might be > the appropriate amount of effort. What about meeting musl halfway: clobber argv, but only clobber environ for the libcs known to tolerate that? Then musl might see truncation at 30-60 characters or whatever it is, but that's probably enough to see your cluster_name and backend type/user name which is pretty useful information.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andrew Dunstan <andrew@dunslane.net> — 2024-03-21T23:23:45Z
On Thu, Mar 21, 2024 at 7:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Just a thought: if we want to go this way, do we need a new exec call? > > We already control the initial exec in pg_ctl.c. > > I'm resistant to assuming the postmaster is launched through pg_ctl. > systemd, for example, might well prefer not to do that, not to > mention all the troglodytes still using 1990s launch scripts. > > A question that seems worth debating in this thread is how much > updating the process title is even worth nowadays. It feels like > a hangover from before we had pg_stat_activity and other monitoring > support. So I don't feel a huge need to support it on musl. > The previously-suggested patch to whitelist glibc and variants, > and otherwise fall back to PS_USE_NONE, seems like it might be > the appropriate amount of effort. > > > +1 cheers andrew
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-21T23:30:52Z
Thomas Munro <thomas.munro@gmail.com> writes: > On Fri, Mar 22, 2024 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The previously-suggested patch to whitelist glibc and variants, >> and otherwise fall back to PS_USE_NONE, seems like it might be >> the appropriate amount of effort. > What about meeting musl halfway: clobber argv, but only clobber > environ for the libcs known to tolerate that? Then musl might see > truncation at 30-60 characters or whatever it is, but that's probably > enough to see your cluster_name and backend type/user name which is > pretty useful information. No real objection here. I do wonder about the point you (or somebody) made upthread that we don't have any testing of the PS_USE_NONE case; but that could be addressed some other way. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-22T00:07:14Z
On Fri, Mar 22, 2024 at 11:42:52AM +1300, Thomas Munro wrote: > On Fri, Mar 22, 2024 at 9:30 AM <walther@technowledgy.de> wrote: > > > 4. The upstream (musl) suggestion of which I sent a PoC was to "exec > > > yourself with a bigger argv". > > > > We could do this in HEAD now ... > > Just a thought: if we want to go this way, do we need a new exec call? > We already control the initial exec in pg_ctl.c. > > > > Could we even use the exec-approach as the fallback in all other cases > > > except BSDs and Windows and get rid of PS_USE_NONE? > > > > ... and then remove PS_USE_NONE at the beginning of the v18 cycle. > > > > This would give a bit more time for those "other systems", which were > > previously falling back PS_USE_NONE and would then clobber argv, too. > > RIght. It's unspecified by POSIX whether ps shows changes to those > strings (and there are systems that don't), but it can't hurt to do so > anyway, and it'd be better than having a PS_USE_NONE code path that is > untested. I dimly recall that it turned out that PS_USE_NONE was > actually broken for a while without anyone noticing. Actually, I was thinking the opposite. Since the musl libc is widely used, it will be tested, and I don't want to disable process display updates for such a common platform. I suggest we use the #ifdef test to continue our existing behavior for the libraries we know about, like glibc, and use the LD_* process title truncation hack for libc's we don't recognize. Attached is a prototype patch which implements this based on previous patches. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-22T00:08:49Z
On Fri, Mar 22, 2024 at 12:20:11PM +1300, Thomas Munro wrote: > On Fri, Mar 22, 2024 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The previously-suggested patch to whitelist glibc and variants, > > and otherwise fall back to PS_USE_NONE, seems like it might be > > the appropriate amount of effort. > > What about meeting musl halfway: clobber argv, but only clobber > environ for the libcs known to tolerate that? Then musl might see > truncation at 30-60 characters or whatever it is, but that's probably > enough to see your cluster_name and backend type/user name which is > pretty useful information. I just posted a prototype patch to implement this. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andres Freund <andres@anarazel.de> — 2024-03-22T00:19:48Z
Hi, On 2024-03-21 21:16:46 +0100, Wolfgang Walther wrote: > Right. The latter not only confuses musl, but also makes /proc/<pid>/environ > return garbage. This is also mentioned at the bottom of main.c, which has a > workaround for the specific case of UBSan depending on that. This is kind of > funny: Because we are relying on undefined behavior regarding the > modification of environ, we need a workaround for the > "UndefinedBehaviorSanitizer" - I guess by failing without this workaround, > it wanted to tell us something.. I don't think that's quite a fair description. Ubsan is basically doing undefined things itself, so it's turtles all the way down. > So summarizing: FWIW, independent of which fix we go with, I think we need a buildfarm animal using musl. Even better if one of the CI tasks can be made to use musl as well. Greetings, Andres Freund
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andrew Dunstan <andrew@dunslane.net> — 2024-03-22T06:12:26Z
Sent from my iPad > On Mar 22, 2024, at 10:49 AM, Andres Freund <andres@anarazel.de> wrote: > > Hi, > >> On 2024-03-21 21:16:46 +0100, Wolfgang Walther wrote: >> Right. The latter not only confuses musl, but also makes /proc/<pid>/environ >> return garbage. This is also mentioned at the bottom of main.c, which has a >> workaround for the specific case of UBSan depending on that. This is kind of >> funny: Because we are relying on undefined behavior regarding the >> modification of environ, we need a workaround for the >> "UndefinedBehaviorSanitizer" - I guess by failing without this workaround, >> it wanted to tell us something.. > > I don't think that's quite a fair description. Ubsan is basically doing > undefined things itself, so it's turtles all the way down. > > >> So summarizing: > > FWIW, independent of which fix we go with, I think we need a buildfarm animal > using musl. Even better if one of the CI tasks can be made to use musl as > well. We had one till 3 months ago. It’s on my list to recreate. Cheers Andrew
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-22T06:15:27Z
Andrew Dunstan <andrew@dunslane.net> writes: > On Mar 22, 2024, at 10:49 AM, Andres Freund <andres@anarazel.de> wrote: >> FWIW, independent of which fix we go with, I think we need a buildfarm animal >> using musl. Even better if one of the CI tasks can be made to use musl as >> well. > We had one till 3 months ago. It’s on my list to recreate. How was it passing? The issue discussed in this thread has surely been there for a long time, and Wolfgang mentioned that he sees others. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andrew Dunstan <andrew@dunslane.net> — 2024-03-22T07:02:19Z
> On Mar 22, 2024, at 4:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew Dunstan <andrew@dunslane.net> writes:age >>> On Mar 22, 2024, at 10:49 AM, Andres Freund <andres@anarazel.de> wrote: >>> FWIW, independent of which fix we go with, I think we need a buildfarm animal >>> using musl. Even better if one of the CI tasks can be made to use musl as >>> well. > >> We had one till 3 months ago. It’s on my list to recreate. > > How was it passing? The issue discussed in this thread has surely > been there for a long time, and Wolfgang mentioned that he sees > others. > > The buildfarm client has a switch that delays running regression tests until after the install stages. Cheers Andrew
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-22T07:55:52Z
Andres Freund: > FWIW, independent of which fix we go with, I think we need a buildfarm animal > using musl. Even better if one of the CI tasks can be made to use musl as > well. I am already working with Andrew to set up a buildfarm animal to run Alpine Linux/musl. I can look into the CI task as well. Are you suggesting to change an existing task to run with Alpine/musl or to add a new task for it? It would be docker image based for sure. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-22T08:02:04Z
Andrew Dunstan: >> On Mar 22, 2024, at 4:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> How was it passing? The issue discussed in this thread has surely >> been there for a long time, and Wolfgang mentioned that he sees >> others. > > The buildfarm client has a switch that delays running regression tests until after the install stages. Hm. So while that switch makes the animal pass the build, it did hide exactly this problem. Not sure whether this switch should be used at all, then. Was this switch only implemented for the specific case of Alpine/musl or is there a different reason for it, as well? The other issues I had been seeing were during make check-world, but not make check. Those were things around setlocale() / /bin/locale, IIRC. Not sure whether all of the tests are run by the buildfarm? Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-22T08:12:47Z
Tom Lane: > Thomas Munro <thomas.munro@gmail.com> writes: >> Just a thought: if we want to go this way, do we need a new exec call? >> We already control the initial exec in pg_ctl.c. > > I'm resistant to assuming the postmaster is launched through pg_ctl. > systemd, for example, might well prefer not to do that, not to > mention all the troglodytes still using 1990s launch scripts. Right, the systemd example in the docs is not using pg_ctl. But, it should be easily possible to have: - pg_ctl call postgres with a padded argv0 - postgres call itself with padding, if it wasn't called with that already This way, there would be no additional exec call when started through pg_ctl, but one more call when started directly. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-22T08:33:38Z
Bruce Momjian: > I suggest we use the #ifdef test to continue our existing behavior for > the libraries we know about, like glibc, and use the LD_* process title > truncation hack for libc's we don't recognize. > > Attached is a prototype patch which implements this based on previous > patches. The condition to check for linux/glibc in your patch is slightly off: #if ! defined(__linux__) || (! defined(__GLIBC__) && defined(__UCLIBC__ )) should be #if defined(__linux__) && ! (defined(__GLIBC__) || defined(__UCLIBC__ )) With the latter, it passes tests with musl. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-22T09:41:39Z
Wolfgang Walther: > The other issues I had been seeing were during make check-world, but not > make check. Those were things around setlocale() / /bin/locale, IIRC. > Not sure whether all of the tests are run by the buildfarm? Ah, those other tests only fail when building --with-icu, but Andrew's animal didn't do that. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andrew Dunstan <andrew@dunslane.net> — 2024-03-22T12:10:27Z
On Fri, Mar 22, 2024 at 4:02 AM Wolfgang Walther <walther@technowledgy.de> wrote: > Andrew Dunstan: > >> On Mar 22, 2024, at 4:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> How was it passing? The issue discussed in this thread has surely > >> been there for a long time, and Wolfgang mentioned that he sees > >> others. > > > > The buildfarm client has a switch that delays running regression tests > until after the install stages. > > Hm. So while that switch makes the animal pass the build, it did hide > exactly this problem. Not sure whether this switch should be used at > all, then. Was this switch only implemented for the specific case of > Alpine/musl or is there a different reason for it, as well? > Alpine was the main motivation, but it's also probably useful on Macs with SIP enabled. ISTR raising the Alpine issue back then (2018) but I can't find a reference now. cheers andrew
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-22T13:36:19Z
On Fri, Mar 22, 2024 at 09:33:38AM +0100, walther@technowledgy.de wrote: > Bruce Momjian: > > I suggest we use the #ifdef test to continue our existing behavior for > > the libraries we know about, like glibc, and use the LD_* process title > > truncation hack for libc's we don't recognize. > > > > Attached is a prototype patch which implements this based on previous > > patches. > > The condition to check for linux/glibc in your patch is slightly off: > > #if ! defined(__linux__) || (! defined(__GLIBC__) && defined(__UCLIBC__ )) > > should be > > #if defined(__linux__) && ! (defined(__GLIBC__) || defined(__UCLIBC__ )) > > With the latter, it passes tests with musl. Yes, my logic was wrong. Not sure what I was thinking, frankly. I am not a big fan of negating a complex conditional, but would rather pass the negation into the conditional, new patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-22T19:44:28Z
On Fri, Mar 22, 2024 at 09:36:19AM -0400, Bruce Momjian wrote: > On Fri, Mar 22, 2024 at 09:33:38AM +0100, walther@technowledgy.de wrote: > > Bruce Momjian: > > > I suggest we use the #ifdef test to continue our existing behavior for > > > the libraries we know about, like glibc, and use the LD_* process title > > > truncation hack for libc's we don't recognize. > > > > > > Attached is a prototype patch which implements this based on previous > > > patches. > > > > The condition to check for linux/glibc in your patch is slightly off: > > > > #if ! defined(__linux__) || (! defined(__GLIBC__) && defined(__UCLIBC__ )) > > > > should be > > > > #if defined(__linux__) && ! (defined(__GLIBC__) || defined(__UCLIBC__ )) > > > > With the latter, it passes tests with musl. > > Yes, my logic was wrong. Not sure what I was thinking, frankly. > > I am not a big fan of negating a complex conditional, but would rather > pass the negation into the conditional, new patch attached. With no one "hoping this patch dies in a fire"*, I have updated it with more details, which I now think is committable to master. Is this something to backpatch? Seems too rare a bug to me. * Robert Haas, https://www.postgresql.org/message-id/CA%2BTgmoYsyrCNmg%2BYh6rgP7K8r-bYPjCeF1tPxENRFwD4VZAZvw%40mail.gmail.com -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-25T19:20:47Z
On Thu, Mar 21, 2024 at 11:26 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Some speculation on how widespread this trick is: as mentioned, it Just one more, which I just ran into by accident while looking for something else, which I'm posting just in case this thread ever gets used to try to convince musl hackers to change their end to support it: https://github.com/openzfs/zfs/blob/master/lib/libzutil/os/linux/zutil_setproctitle.c#L158
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-25T21:58:30Z
Bruce Momjian: > With no one "hoping this patch dies in a fire"*, I have updated it with > more details, which I now think is committable to master. Is this > something to backpatch? Seems too rare a bug to me. I would like to see this backpatched to be able to run the regression tests easily on all stable branches. I have taken your's/Thomas' patch and extended it with a few more taking in many of the ideas in this thread: 0001 Don't clobber LD_* This is the patch you posted. This applies cleanly all the way down to v12. This fixes the bug and allows running most of the tests with musl - yeah! I also confirmed, that this will not create practical problems with library/postgres docker image, where this is likely used the most. While "postgres" is called by default without any arguments here, plenty of environment variables are passed. The docker image does use LD_PRELOAD to trick initdb, but that's not set when running the postmaster, so not a problem here. This use-case also shows why the proposed patch to still partially clobber environ at this stage is better than to not clobber environ at all - in this case, the docker image would essentially have no ps status at all by default.- 0002 Allow setting PS_USE_NONE via CPPFLAGS This was proposed by Andrew and applies cleanly down to v12. Thus, it could be backpatched, too. First and foremost this would allow setting a buildfarm animal to use this flag to make sure this code path is actually build/tested at all. This is something that Thomas and Tom hinted at. 0003 Don't ever clobber environ again This is the approach I had previously posted as a PoC. This would not be backpatched, but I suggested this could go into v17 now. This avoids the undefined behavior and sets the table to eventually set ps status via argv by default and remove PS_USE_NONE later. Compared to the PoC patch, I decided not to pad argv[0], because this will break the ps status display for the postmaster itself. Instead exec is called with an additional argument consisting of exactly 255 spaces. I also tried avoiding the additional exec-call if postgres was called via pg_ctl, as suggested by Peter. This quickly turned out to be more invasive than I would have liked this to be, though. The current approach works very well, the environment doesn't need to be copied anymore and the workaround for /proc/<pid>/environ in main.c can go away, too. 0004 Default to PS_USE_CLOBBER_ARGV This changes the default to display ps status on all other systems, too. This could potentially go in now as well, or be delayed to the beginning of the v18 cycle. In the unlikely event that this breaks something on a platform not considered here and we get a bug report, we can easily advise to compile with CPPFLAGS=-DPS_USE_NONE, which is still there at this stage. 0005 Remove PS_USE_NONE However, if no reports come in and no problems are detected with 0004, then this can be entirely removed. This for "later", whenever that is. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-25T22:35:06Z
On Mon, Mar 25, 2024 at 10:58:30PM +0100, walther@technowledgy.de wrote: > Bruce Momjian: > > With no one "hoping this patch dies in a fire"*, I have updated it with > > more details, which I now think is committable to master. Is this > > something to backpatch? Seems too rare a bug to me. > > I would like to see this backpatched to be able to run the regression tests > easily on all stable branches. You want to risk destabilizing Postgres by backpatching something this complex so the regression tests can be run on all stable branches? I think you are overestimating our desire to take on risk. Also, in my patch, the parentheses here: #if defined(__linux__) && (! defined(__GLIBC__) && ! defined(__UCLIBC__)) are unnecessary so they should be removed: #if defined(__linux__) && ! defined(__GLIBC__) && ! defined(__UCLIBC__) I am only willing to apply my patch, and only to master. Other committers might be more willing. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Peter Eisentraut <peter@eisentraut.org> — 2024-03-25T22:46:09Z
On 22.03.24 20:44, Bruce Momjian wrote: > + * linking (dlopen) might fail. Here, we truncate the update > + * of the process title when either of two important dynamic > + * linking environment variables are set. Musl does not > + * define any compiler symbols, so we have to do this for > + * any Linux libc we don't know is safe. > + */ > + if (strstr(environ[i], "LD_LIBRARY_PATH=") == environ[i] || > + strstr(environ[i], "LD_PRELOAD=") == environ[i]) What determines which variables require this treatment?
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-25T22:51:09Z
On Mon, Mar 25, 2024 at 11:46:09PM +0100, Peter Eisentraut wrote: > On 22.03.24 20:44, Bruce Momjian wrote: > > + * linking (dlopen) might fail. Here, we truncate the update > > + * of the process title when either of two important dynamic > > + * linking environment variables are set. Musl does not > > + * define any compiler symbols, so we have to do this for > > + * any Linux libc we don't know is safe. > > + */ > > + if (strstr(environ[i], "LD_LIBRARY_PATH=") == environ[i] || > > + strstr(environ[i], "LD_PRELOAD=") == environ[i]) > > What determines which variables require this treatment? Thomas Munro came up with that part of the patch. I just combined his patch with the macro test. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-25T22:54:59Z
On Tue, Mar 26, 2024 at 11:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: > What determines which variables require this treatment? That came from me peeking at their code: https://www.postgresql.org/message-id/CA%2BhUKGKNK5V8XwwJJZm36s3EUy8V51xu4XiE8%3D26n%3DWq3OGd4A%40mail.gmail.com I had originally proposed to avoid anything beginning "LD_" but Tom suggested being more specific. I doubt LD_PRELOAD can really hurt you though (the linker probably only needs the value at the start by definition, not at later dlopen() time (?)). I dunno. If you're asking if there is any standard or whatever supplying these names, the System V or at least ELF standards talk about LD_LIBRARY_PATH (though those standards don't know/care what happens after userspace takes over control of the environment concept, they just talk about how the world is created when you exec a process, so they AFAICS they don't address this clobbering stuff, and AFAIK other LD_XXX stuff is probably implementation specific).
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-25T23:14:25Z
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Mar 25, 2024 at 10:58:30PM +0100, walther@technowledgy.de wrote: >> I would like to see this backpatched to be able to run the regression tests >> easily on all stable branches. > You want to risk destabilizing Postgres by backpatching something this > complex so the regression tests can be run on all stable branches? I > think you are overestimating our desire to take on risk. If we want a buildfarm animal testing this platform, we kind of need to support it on all branches. Having said that, I agree with you that we are looking for a minimalist fix not a maximalist one. I think the 0001 patch is about right, but the rest seem to be solving problems we don't have. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-25T23:15:52Z
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Mar 26, 2024 at 11:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> What determines which variables require this treatment? > I had originally proposed to avoid anything beginning "LD_" but Tom > suggested being more specific. I doubt LD_PRELOAD can really hurt you > though (the linker probably only needs the value at the start by > definition, not at later dlopen() time (?)). Oh, good point. So we could simplify the patch by only looking for LD_LIBRARY_PATH. regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-25T23:21:48Z
On Mon, Mar 25, 2024 at 07:14:25PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Mon, Mar 25, 2024 at 10:58:30PM +0100, walther@technowledgy.de wrote: > >> I would like to see this backpatched to be able to run the regression tests > >> easily on all stable branches. > > > You want to risk destabilizing Postgres by backpatching something this > > complex so the regression tests can be run on all stable branches? I > > think you are overestimating our desire to take on risk. > > If we want a buildfarm animal testing this platform, we kind of need > to support it on all branches. Having said that, I agree with you > that we are looking for a minimalist fix not a maximalist one. > I think the 0001 patch is about right, but the rest seem to be solving > problems we don't have. I could support the minimalist patch applied to all branches. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-25T23:43:08Z
I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> I had originally proposed to avoid anything beginning "LD_" but Tom >> suggested being more specific. I doubt LD_PRELOAD can really hurt you >> though (the linker probably only needs the value at the start by >> definition, not at later dlopen() time (?)). > Oh, good point. So we could simplify the patch by only looking for > LD_LIBRARY_PATH. I looked at the musl source code you identified and confirmed that only the LD_LIBRARY_PATH string is remembered in a static variable; LD_PRELOAD is only accessed locally in that initialization function. So we only need to do the attached. (I failed to resist the temptation to rewrite the comments.) regards, tom lane
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Thomas Munro <thomas.munro@gmail.com> — 2024-03-25T23:49:55Z
On Tue, Mar 26, 2024 at 12:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked at the musl source code you identified and confirmed that > only the LD_LIBRARY_PATH string is remembered in a static variable; > LD_PRELOAD is only accessed locally in that initialization function. > So we only need to do the attached. (I failed to resist the > temptation to rewrite the comments.) LGTM.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Andres Freund <andres@anarazel.de> — 2024-03-26T00:14:47Z
Hi, On 2024-03-22 08:55:52 +0100, Wolfgang Walther wrote: > Andres Freund: > > FWIW, independent of which fix we go with, I think we need a buildfarm animal > > using musl. Even better if one of the CI tasks can be made to use musl as > > well. > > I am already working with Andrew to set up a buildfarm animal to run Alpine > Linux/musl. I can look into the CI task as well. Are you suggesting to > change an existing task to run with Alpine/musl or to add a new task for it? > It would be docker image based for sure. I'd rather adapt one of the existing tasks, to avoid increasing CI costs unduly. The way we currently run CI for testing of not-yet-merged patches runs all tasks other than macos as full VMs, that turned out to be faster & cheaper. FWIW, except for one small issue, building postgres against musl works on debian and the tests pass if I install first. The small problem mentioned above is that on debian linux/fs.h isn't available when building with musl, which in turn causes src/bin/pg_upgrade/file.c to fail to compile. I assume that's not the case on "fully musl" distro? Greetings, Andres Freund
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Bruce Momjian <bruce@momjian.us> — 2024-03-26T00:53:05Z
On Tue, Mar 26, 2024 at 12:49:55PM +1300, Thomas Munro wrote: > On Tue, Mar 26, 2024 at 12:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I looked at the musl source code you identified and confirmed that > > only the LD_LIBRARY_PATH string is remembered in a static variable; > > LD_PRELOAD is only accessed locally in that initialization function. > > So we only need to do the attached. (I failed to resist the > > temptation to rewrite the comments.) > > LGTM. +1 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Peter Eisentraut <peter@eisentraut.org> — 2024-03-26T07:19:09Z
On 26.03.24 00:43, Tom Lane wrote: > I wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> I had originally proposed to avoid anything beginning "LD_" but Tom >>> suggested being more specific. I doubt LD_PRELOAD can really hurt you >>> though (the linker probably only needs the value at the start by >>> definition, not at later dlopen() time (?)). > >> Oh, good point. So we could simplify the patch by only looking for >> LD_LIBRARY_PATH. > > I looked at the musl source code you identified and confirmed that > only the LD_LIBRARY_PATH string is remembered in a static variable; > LD_PRELOAD is only accessed locally in that initialization function. > So we only need to do the attached. (I failed to resist the > temptation to rewrite the comments.) Yeah, I was more looking for a comment for posterity for *why* we need to preserve this variable in particular. The updated comment looks reasonable.
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-26T07:36:40Z
Bruce Momjian: > You want to risk destabilizing Postgres by backpatching something this > complex so the regression tests can be run on all stable branches? I > think you are overestimating our desire to take on risk. I specifically wrote about backpatching the first (and maybe second) patch. None of that is risky. Patches 3-5 were not meant for backpatching at all. Best, Wolfgang
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Wolfgang Walther <walther@technowledgy.de> — 2024-03-26T07:43:54Z
Tom Lane: > If we want a buildfarm animal testing this platform, we kind of need > to support it on all branches. Having said that, I agree with you > that we are looking for a minimalist fix not a maximalist one. > I think the 0001 patch is about right, but the rest seem to be solving > problems we don't have. The second patch potentially solves the problem of PS_USE_NONE not being tested. Of course you could also set up a buildfarm animal on some other platform, which is sure to fall through to PS_USE_NONE, but that seems to have not worked in the past: Thomas Munro: > I dimly recall that it turned out that PS_USE_NONE was > actually broken for a while without anyone noticing Best, Wolfgang
-
Building with musl in CI and the build farm
Wolfgang Walther <walther@technowledgy.de> — 2024-03-26T08:22:28Z
The need to do $subject came up in [1]. Moving this to a separate discussion on -hackers, because there are more issues to solve than just the LD_LIBRARY_PATH problem. Andres Freund: > FWIW, except for one small issue, building postgres against musl works on > debian and the tests pass if I install first. > > > The small problem mentioned above is that on debian linux/fs.h isn't available > when building with musl, which in turn causes src/bin/pg_upgrade/file.c to > fail to compile. I assume that's not the case on "fully musl" distro? Correct, I have not seen this before on Alpine. Here is my progress setting up a buildfarm animal to run on Alpine Linux and the issues I found, so far: The animal runs in a docker container via GitHub Actions in [2]. Right now it's still running with --test, until I get the credentials to activate it. I tried to enable everything (except systemd, because Alpine doesn't have it) and run all tests. The LDAP tests are failing right now, but that is likely something that I need to fix in the Dockerfile - it's failing to start the slapd, IIRC. There are other issues, though - all of them have open pull requests in that repo [3]. I also had to skip the recovery check. Andrew mentioned that he had to do that, too, when he was still running his animal on Alpine. Not sure what this is about, yet. Building --with-icu fails two tests. One of them (001_initdb) is fixed by having the "locale" command in your PATH, which is not the case on Alpine by default. I assume this will not break on your debian/musl build, Andres - but it will also probably not return any sane values, because it will run glibc's locale command. I haven't looked into that in detail, yet, but I think the other test (icu/010_database) fails because it expects that setlocale(LC_COLLATE, <illegal_value>) throws an error. I think it doesn't do that on musl, because LC_COLLATE is not implemented. Those failing tests are not "just failing", but probably mean that we need to do something about how we deal with locale/setlocale on musl. The last failure is about building --with-nls. This fails with something like: ld: src/port/strerror.c:72:(.text+0x2d8): undefined reference to `libintl_gettext' Of course, gettext-dev is installed, libintl.so is available in /usr/lib and it also contains the symbol. So not sure what's happening here. Andres, did you build --with-icu and/or --with-nls on debian/musl? Did you run the recovery tests? Best, Wolfgang [1]: https://postgr.es/m/fddd1cd6-dc16-40a2-9eb5-d7fef2101488%40technowledgy.de [2]: https://github.com/technowledgy/postgresql-buildfarm-alpine/actions/workflows/run.yaml [3]: https://github.com/technowledgy/postgresql-buildfarm-alpine/pulls
-
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-26T14:35:43Z
walther@technowledgy.de writes: > The second patch potentially solves the problem of PS_USE_NONE not being > tested. Of course you could also set up a buildfarm animal on some other > platform, which is sure to fall through to PS_USE_NONE, but that seems > to have not worked in the past: > Thomas Munro: >> I dimly recall that it turned out that PS_USE_NONE was >> actually broken for a while without anyone noticing I think what Thomas is recollecting is this: Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master Release: REL_15_BR [0fb6954aa] 2022-03-27 12:57:46 -0400 Branch: REL_14_STABLE Release: REL_14_3 [3f7a59c59] 2022-03-27 12:57:52 -0400 Branch: REL_13_STABLE Release: REL_13_7 [9016a2a3d] 2022-03-27 12:57:57 -0400 Fix breakage of get_ps_display() in the PS_USE_NONE case. Commit 8c6d30f21 caused this function to fail to set *displen in the PS_USE_NONE code path. If the variable's previous value had been negative, that'd lead to a memory clobber at some call sites. We'd managed not to notice due to very thin test coverage of such configurations, but this appears to explain buildfarm member lorikeet's recent struggles. Credit to Andrew Dunstan for spotting the problem. Back-patch to v13 where the bug was introduced. Discussion: https://postgr.es/m/136102.1648320427@sss.pgh.pa.us The problem wasn't lack of coverage, it was that the failure was intermittent and erratic enough to be very hard to diagnose. I think that's more bad luck than anything else. regards, tom lane -
Re: Regression tests fail with musl libc because libpq.so can't be loaded
Tom Lane <tgl@sss.pgh.pa.us> — 2024-03-26T15:45:55Z
Peter Eisentraut <peter@eisentraut.org> writes: > On 26.03.24 00:43, Tom Lane wrote: >> I looked at the musl source code you identified and confirmed that >> only the LD_LIBRARY_PATH string is remembered in a static variable; >> LD_PRELOAD is only accessed locally in that initialization function. >> So we only need to do the attached. (I failed to resist the >> temptation to rewrite the comments.) > Yeah, I was more looking for a comment for posterity for *why* we need > to preserve this variable in particular. The updated comment looks > reasonable. OK, pushed after a bit more comment-fiddling. regards, tom lane
-
Re: Building with musl in CI and the build farm
Wolfgang Walther <walther@technowledgy.de> — 2024-03-30T14:05:19Z
Here's an update on the progress to run musl (Alpine Linux) in the buildfarm. Wolfgang Walther: > The animal runs in a docker container via GitHub Actions in [2]. Right > now it's still running with --test, until I get the credentials to > activate it. The animals have been activated and are reporting now. Thanks, Andrew! > I tried to enable everything (except systemd, because Alpine doesn't > have it) and run all tests. The LDAP tests are failing right now, but > that is likely something that I need to fix in the Dockerfile - it's > failing to start the slapd, IIRC. There are other issues, though - all > of them have open pull requests in that repo [3]. ldap tests are enabled, just a missing package. > I also had to skip the recovery check. Andrew mentioned that he had to > do that, too, when he was still running his animal on Alpine. Not sure > what this is about, yet. This was about a missing init process in the docker image. Without an init process reaping zombie processes, the recovery tests end up with some supposed-to-be-terminated backends still running and can't start them up again. Fixed by adding a minimal init process with "tinit". > Building --with-icu fails two tests. One of them (001_initdb) is fixed > by having the "locale" command in your PATH, which is not the case on > Alpine by default. I assume this will not break on your debian/musl > build, Andres - but it will also probably not return any sane values, > because it will run glibc's locale command. > I haven't looked into that in detail, yet, but I think the other test > (icu/010_database) fails because it expects that setlocale(LC_COLLATE, > <illegal_value>) throws an error. I think it doesn't do that on musl, > because LC_COLLATE is not implemented. > Those failing tests are not "just failing", but probably mean that we > need to do something about how we deal with locale/setlocale on musl. I still need to look into this in depth. > The last failure is about building --with-nls. This fails with something > like: > > ld: src/port/strerror.c:72:(.text+0x2d8): undefined reference to > `libintl_gettext' > > Of course, gettext-dev is installed, libintl.so is available in /usr/lib > and it also contains the symbol. So not sure what's happening here. This is an Alpine Linux packaging issue. Theoretically, it could be made to work by introducing some configure/meson flag like "--with-gettext" or so, to prefer gettext's libintl over the libc-builtin. However, NixOS / nixpkgs with its pkgsMusl overlay manages to solve this issue just fine, builds with --enable-nls and gettext work. Thus, I conclude this is best solved upstream in Alpine Linux. TLDR: The only real issue which is still open from PostgreSQL's side is around locales and ICU - certainly the pain point in musl. Will look into it further. Best, Wolfgang
-
Re: Building with musl in CI and the build farm
Wolfgang Walther <walther@technowledgy.de> — 2024-03-31T13:34:23Z
About building one of the CI tasks with musl: Andres Freund: > I'd rather adapt one of the existing tasks, to avoid increasing CI costs unduly. I looked into this and I think the only task that could be changed is the SanityCheck. This is because this builds without any additional features enabled. I guess that makes sense, because otherwise those dependencies would first have to be built with musl-gcc as well. > FWIW, except for one small issue, building postgres against musl works on debian and the tests pass if I install first. After the fix for LD_LIBRARY_PATH this now works as expected without installing first. I confirmed it works on debian with CC=musl-gcc. > The small problem mentioned above is that on debian linux/fs.h isn't available > when building with musl, which in turn causes src/bin/pg_upgrade/file.c to > fail to compile. According to [1], this can be worked around by linking some folders: ln -s /usr/include/linux /usr/include/x86_64-linux-musl/ ln -s /usr/include/asm-generic /usr/include/x86_64-linux-musl/ ln -s /usr/include/x86_64-linux-gnu/asm /usr/include/x86_64-linux-musl/ Please find a patch to use musl-gcc in SanityCheck attached. Logs from the CI run are in [2]. It has this in the configure phase: [13:19:52.712] Using 'CC' from environment with value: 'ccache musl-gcc' [13:19:52.712] C compiler for the host machine: ccache musl-gcc (gcc 10.2.1 "cc (Debian 10.2.1-6) 10.2.1 20210110") [13:19:52.712] C linker for the host machine: musl-gcc ld.bfd 2.35.2 [13:19:52.712] Using 'CC' from environment with value: 'ccache musl-gcc' So meson picks up musl-gcc properly. I also double checked that without the links above, the build does indeed fail with the linux/fs.h error. I assume the installation of musl-tools should be done in the pg-vm-images repo instead of the additional script here? Best, Wolfgang [1]: https://debian-bugs-dist.debian.narkive.com/VlFkLigg/bug-789789-musl-fails-to-compile-stuff-that-depends-on-kernel-headers [2]: https://cirrus-ci.com/task/5741892590108672
-
Re: Building with musl in CI and the build farm
Thomas Munro <thomas.munro@gmail.com> — 2024-04-03T00:00:01Z
On Wed, Mar 27, 2024 at 11:27 AM Wolfgang Walther <walther@technowledgy.de> wrote: > The animal runs in a docker container via GitHub Actions in [2]. Great idea :-)
-
Re: Building with musl in CI and the build farm
Peter Eisentraut <peter@eisentraut.org> — 2024-04-04T13:56:25Z
On 31.03.24 15:34, walther@technowledgy.de wrote: >> I'd rather adapt one of the existing tasks, to avoid increasing CI >> costs unduly. > > I looked into this and I think the only task that could be changed is > the SanityCheck. I think SanityCheck should run a simple, "average" environment, like the current Debian one. Otherwise, niche problems with musl or multi-arch or whatever will throw off the entire build pipeline.
-
Re: Building with musl in CI and the build farm
Wolfgang Walther <walther@technowledgy.de> — 2024-04-04T14:11:56Z
Peter Eisentraut: > On 31.03.24 15:34, walther@technowledgy.de wrote: >>> I'd rather adapt one of the existing tasks, to avoid increasing CI >>> costs unduly. >> >> I looked into this and I think the only task that could be changed is >> the SanityCheck. > > I think SanityCheck should run a simple, "average" environment, like the > current Debian one. Otherwise, niche problems with musl or multi-arch > or whatever will throw off the entire build pipeline. All the errors/problems I have seen so far, while setting up the buildfarm animal on Alpine Linux, have been way beyond what SanityCheck does. Problems only appeared in the tests suites, of which sanity check only runs *very* basic ones. I don't have much experience with the "cross" setup, that "musl on debian" essentially is, though. All those things are certainly out of scope for CI - they are tested in the build farm instead. I do agree: SanityCheck doesn't feel like the right place to put this. But on the other side.. if it really fails to *build* with musl, then it shouldn't make a difference whether you will be notified about that immediately or later in the CI pipeline. It certainly needs the fewest additional resources to put it there. I'm not sure what Andres meant with "adopting one of the existing tasks". It could fit as another step into the "Linux - Debian Bullseye - Autoconf" task, too. A bit similar to how the meson task build for 32 and 64bit. This would still not be an entirely new task like I proposed initially (to run in Docker). Best, Wolfgang
-
Re: Building with musl in CI and the build farm
Tom Lane <tgl@sss.pgh.pa.us> — 2024-04-04T14:36:41Z
Wolfgang Walther <walther@technowledgy.de> writes: > Peter Eisentraut: >> I think SanityCheck should run a simple, "average" environment, like the >> current Debian one. Otherwise, niche problems with musl or multi-arch >> or whatever will throw off the entire build pipeline. > I do agree: SanityCheck doesn't feel like the right place to put this. > But on the other side.. if it really fails to *build* with musl, then it > shouldn't make a difference whether you will be notified about that > immediately or later in the CI pipeline. It certainly needs the fewest > additional resources to put it there. That is not the concern here. What I think Peter is worried about, and certainly what I'm worried about, is that a breakage in SanityCheck comprehensively breaks all CI testing for all Postgres developers. One buildfarm member that's failing does not halt progress altogether, so it's not even in the same ballpark of being as critical. So I agree with Peter that SanityCheck had better use a very common, vanilla environment. To be blunt, I do not think we need to test musl in the CI pipeline. I see it as one of the niche platforms that the buildfarm exists to test. regards, tom lane
-
Re: Building with musl in CI and the build farm
Wolfgang Walther <walther@technowledgy.de> — 2024-04-04T15:01:32Z
Tom Lane: > That is not the concern here. What I think Peter is worried about, > and certainly what I'm worried about, is that a breakage in > SanityCheck comprehensively breaks all CI testing for all Postgres > developers. You'd have to commit a failing patch first to break CI for all other developers. If you're only going to commit patches that pass those CI tasks, then this is not going to happen. Then it only becomes a question of how much feedback *you* get from a single CI run of your own patch. > To be blunt, I do not think we need to test musl in the CI pipeline. > I see it as one of the niche platforms that the buildfarm exists > to test. I don't really have an opinion on this. I'm fine with having musl in the buildfarm only. I don't expect the core build itself to fail with musl anyway, this has been working fine for years. Andres asked for it to be added to CI, so maybe he sees more value on top of just "building with musl"? Best, Wolfgang
-
Re: Building with musl in CI and the build farm
Tom Lane <tgl@sss.pgh.pa.us> — 2024-04-04T15:19:11Z
Wolfgang Walther <walther@technowledgy.de> writes: >> That is not the concern here. What I think Peter is worried about, >> and certainly what I'm worried about, is that a breakage in >> SanityCheck comprehensively breaks all CI testing for all Postgres >> developers. > You'd have to commit a failing patch first to break CI for all other > developers. No, what I'm more worried about is some change in the environment causing the build to start failing. When that happens, it'd better be an environment that many of us are familiar with and can test/fix. regards, tom lane
-
Re: Building with musl in CI and the build farm
Wolfgang Walther <walther@technowledgy.de> — 2024-04-04T19:09:30Z
Tom Lane: >> You'd have to commit a failing patch first to break CI for all other >> developers. > > No, what I'm more worried about is some change in the environment > causing the build to start failing. When that happens, it'd better > be an environment that many of us are familiar with and can test/fix. The way I understand how this work is, that the images for the VMs in which those CI tasks run, are not just dynamically updated - but are actually tested before they are used in CI. So the environment doesn't just change suddenly. See e.g. [1] for a pull request to the repo containing those images to update the linux debian image from bullseye to bookworm. This is exactly the image we're talking about. Before this image is used in postgres CI, it's tested and confirmed that it actually works there. If one of the jobs was using musl - that would be tested as well. So this job would not just suddenly start failing for everybody. I do see the "familiarity" argument for the SanityCheck task, but for a different reason: Even though it's unlikely for this job to fail for musl specific reasons - if you're not familiar with musl and can't easily test it locally, you might not be able to tell immediately whether it's musl specific or not. If musl was run in one of the later jobs, it's much different: You see all tests failing - alright, not musl specific. You see only the musl test failing - yeah, musl problem. This should give developers much more confidence looking at the results. Best, Wolfgang [1]: https://github.com/anarazel/pg-vm-images/pull/91