Обсуждение: Regression tests fail with musl libc because libpq.so can't be loaded
Running the regression tests when building with musl libc fails, with errors like the following: ERROR: could not load library "<builddir>/tmp_install/usr/lib/postgresql/libpqwalreceiver.so": Error loading shared library libpq.so.5: No such file or directory (needed by <builddir>/tmp_install/usr/lib/postgresql/libpqwalreceiver.so) This was observed in Alpine Linux [1] and nixpkgs [2] a few years ago. I now looked at this a bit and this is what happens: - The temporary install location is set via LD_LIBRARY_PATH in the regression tests, so that postgres can find those libs. - All tests which load an extension / shared module via dlopen() fail, when the loaded library in turn depends on another library in tmp_install - I think in practice it's libpq.so all the time. - LD_LIBRARY_PATH is used correctly to look for the direct dependency loaded in dlopen(...), but is not taken into account anymore when trying to locate libpq.so. This step only fails with musl, but works fine with glibc. I can reproduce this with a simple Dockerfile (attached), which uses the library/postgres-alpine image, moves libpq.so to a different folder and points LD_LIBRARY_PATH at it. Build and run the dockerfile like this: docker build . -t pg-musl && docker run --rm pg-musl This Dockerfile can easily be adjusted to work with the debian image - which shows that doing the same with glibc works just fine. Even though this originated in "just" the regression tests, I'm filing this as a bug, because: - The docs explicitly mention LD_LIBRARY_PATH support to point at a different /lib folder in [3]. - This can clearly break outside the test-suite as shown with the Dockerfile. I tried a few more things: - When I add an /etc/ld-musl-$(ARCH).path file and add the path to libpq.so's libdir to it, libpq.so can be found. - When I add the path to libpq.so as an rpath to the postgres binary, libpq.so can be found. Both is not surprising, but just confirms musl-ld actually works as expected. It's just LD_LIBRARY_PATH that seems to not be passed on. To rule out a musl bug, I also put together a very simple test-case of an executable loading liba with dlopen(), which depends on libb and then constructing the same scenario with LD_LIBRARY_PATH. This works fine when compiled with glibc and musl, too. Thus, I believe the problem to be somewhere in how postgres loads those libraries. Best, Wolfgang [1]: https://github.com/alpinelinux/aports/commit/d67ceb66a1ca9e1899071c9ef09fffba29fa0417#diff-2bd25b5172fc52319de1b09086ac0db6314d2e9fa73497979f5198f8caaec1b9 [2]: https://github.com/NixOS/nixpkgs/commit/09ffd722072291f00f2a54d7404eb568a15e562a [3]: https://www.postgresql.org/docs/current/install-post.html
Вложения
Wolfgang Walther <walther@technowledgy.de> writes: > - LD_LIBRARY_PATH is used correctly to look for the direct dependency > loaded in dlopen(...), but is not taken into account anymore when trying > to locate libpq.so. This step only fails with musl, but works fine with > glibc. Why do you think this is our bug and not musl's? We do not even have any code that knows anything about indirect library dependencies. regards, tom lane
Tom Lane: > Why do you think this is our bug and not musl's? Because I tried to reproduce with musl directly with a very simple example as mentioned in: > To rule out a musl bug, I also put together a very simple test-case of an executable loading liba with dlopen(), whichdepends on libb and then constructing the same scenario with LD_LIBRARY_PATH. This works fine when compiled with glibcand musl, too. Thus, I believe the problem to be somewhere in how postgres loads those libraries. My test case looked like the attached. To compile it with musl via Dockerfile: docker build . -t musl-dlopen && docker run --rm musl-dlopen a.c/a.h is equivalent to libpqwalreceiver and b.c/b.h to libpq. This works fine with both musl and glibc. (Note: I also tried putting liba.so and libb.so in different folders, adding both to LD_LIBRARY_PATH - but that worked fine as well.) Now my very simple example probably does something different than postgres, so that the problem doesn't appear there. But since it seems possible to do this with musl in principle, it should be possible to do it differently in postgres to make it work, too. Any ideas? Best, Wolfgang
Вложения
On Sat, Mar 16, 2024 at 11:55 AM Wolfgang Walther <walther@technowledgy.de> wrote:
Tom Lane:
> Why do you think this is our bug and not musl's?
Because I tried to reproduce with musl directly with a very simple
example as mentioned in:
> To rule out a musl bug, I also put together a very simple test-case of an executable loading liba with dlopen(), which depends on libb and then constructing the same scenario with LD_LIBRARY_PATH. This works fine when compiled with glibc and musl, too. Thus, I believe the problem to be somewhere in how postgres loads those libraries.
My test case looked like the attached. To compile it with musl via
Dockerfile:
docker build . -t musl-dlopen && docker run --rm musl-dlopen
a.c/a.h is equivalent to libpqwalreceiver and b.c/b.h to libpq.
This works fine with both musl and glibc.
(Note: I also tried putting liba.so and libb.so in different folders,
adding both to LD_LIBRARY_PATH - but that worked fine as well.)
Now my very simple example probably does something different than
postgres, so that the problem doesn't appear there. But since it seems
possible to do this with musl in principle, it should be possible to do
it differently in postgres to make it work, too.
Any ideas?
On Alpine Linux, which uses musl libc, you have to run `make install` before you can run `make check`. Have you tried that?
(Note to self: need a new Alpine buildfarm member)
cheers
andrew
Andrew Dunstan: > On Alpine Linux, which uses musl libc, you have to run `make install` > before you can run `make check`. Have you tried that? I can see how that could work around the problem, because the library would already be available in the default library path / rpath and LD_LIBRARY_PATH would not be needed. However, this would only be a workaround for the specific case of running the regression tests, not a solution. Using LD_LIBRARY_PATH, as documented, would still not be possible. In my case, I am just using docker images with Alpine to easily reproduce the problem. I am building with NixOS / nixpkgs' pkgsMusl. The order of check and install phases can't be changed there, AFAICT. The workaround I use right now is to temporarily patch rpath of the postgres binary - this will be reset during the install phase anyway. This works, but again is not a real solution. Best, Wolfgang
Andrew Dunstan <andrew@dunslane.net> writes: > On Alpine Linux, which uses musl libc, you have to run `make install` > before you can run `make check`. Have you tried that? We have the same situation on macOS. There, it seems to be the result of a "security feature" that strips DYLD_LIBRARY_PATH from the process environment when make executes a shell. There's not much we can do about that, and I suspect there is not much we can do about musl's behavior either. (I am not a fan of proposals to modify the binaries under test, because then you are not testing what you intend to install.) regards, tom lane
On Sun, Mar 17, 2024 at 4:56 AM Wolfgang Walther <walther@technowledgy.de> wrote: > Any ideas? I'd look into whether there is a difference in the rules it uses for deciding not to trust LD_LIBRARY_PATH, which seems to be around here somewhere: https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1812 I wonder if you can break into an affected program and check out the magic there. FWIW on MacOS something equivalent happens at the moment we execute a shell, because the system shell is 'code signed' and that OS treats signed stuff similar to setuid binaries for this purpose (IIRC setting SHELL to point to a suitable unsigned shell could work around the problem there?) Another interesting thing that came up when I googled musl/glibc differences -- old but looks plausibly still true (not that I expect our code to be modifying that stuff in place, just something to check): https://www.openwall.com/lists/musl/2014/08/31/14
Tom Lane: > We have the same situation on macOS. There, it seems to be the result > of a "security feature" that strips DYLD_LIBRARY_PATH from the process > environment when make executes a shell. I'm not sure whether this explanation is sufficient for the musl case, because LD_LIBRARY_PATH does make a difference: The direct dependency (libpqwalreceiver.so) can still be found if it's moved elsewhere and LD_LIBRARY_PATH points at it. So clearly the LD_LIBRARY_PATH variable is still set after make executed the shell - it's just not in effect on the *indirect* dependency (libpq.so) anymore. Best, Wolfgang
Thomas Munro: > I'd look into whether there is a difference in the rules it uses for > deciding not to trust LD_LIBRARY_PATH, which seems to be around here > somewhere: > > https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1812 Yeah, I have been looking at that, too. I had also experimented a bit with setuid/setgid for that matter, but that didn't lead anywhere, yet. I'm not 100% sure, but I think this would also not match my other observation, that LD_LIBRARY_PATH does work for libpqwalreceiver (direct dep), but not libpq (indirect dep). > Another interesting thing that came up when I googled musl/glibc > differences -- old but looks plausibly still true (not that I expect > our code to be modifying that stuff in place, just something to > check): > > https://www.openwall.com/lists/musl/2014/08/31/14 To me, this seems very much like what could happen - it matches all my observations, so far. But I can't tell how likely that is, not knowing much of the postgres code. Best, Wolfgang
On Sun, Mar 17, 2024 at 9:19 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Another interesting thing that came up when I googled musl/glibc > differences -- old but looks plausibly still true (not that I expect > our code to be modifying that stuff in place, just something to > check): > > https://www.openwall.com/lists/musl/2014/08/31/14 Hmm, that does mention setproctitle, and our ps_status.c does indeed clobber some stuff in that region (in fact our ps_status.c is likely derived from the setproctitle() function from sendmail AFAICT). But that's in our "backend" server processes, unlike the problems we have on Macs... oh but you're failing to load libpqwalreceiver.so which makes some sense for the backend hypothesis. What happens if you hack ps_status.c to use PS_USE_NONE?
Thomas Munro: > Hmm, that does mention setproctitle, and our ps_status.c does indeed > clobber some stuff in that region (in fact our ps_status.c is likely > derived from the setproctitle() function from sendmail AFAICT). But > that's in our "backend" server processes, unlike the problems we have > on Macs... oh but you're failing to load libpqwalreceiver.so which > makes some sense for the backend hypothesis. What happens if you hack > ps_status.c to use PS_USE_NONE? Nailed it. PS_USE_NONE fixes it. Best, Wolfgang
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
Christophe Pettus
Дата:
> On Mar 17, 2024, at 02:44, Wolfgang Walther <walther@technowledgy.de> wrote: > > Nailed it. PS_USE_NONE fixes it. Given the musl (still?) does not define a preprocessor macro specific to it, is there a way of improving the test in pg_status.cto catch this case? It seems wrong that the current test passes a case that doesn't actually work.
Christophe Pettus: > Given the musl (still?) does not define a preprocessor macro specific to it, is there a way of improving the test in pg_status.cto catch this case? It seems wrong that the current test passes a case that doesn't actually work. The missing macro is on purpose and unlikely to change: https://openwall.com/lists/musl/2013/03/29/13 I also found this thread, which discusses exactly our case: https://www.openwall.com/lists/musl/2022/08/17/1 Some quotes from that thread: > I understand that what Postgres et al are doing is a nasty hack. And: > Applications that *really* want setproctitle type functionality can > presumably do something like re-exec themselves with a suitably large > argv[0] to give them safe space to overwrite with their preferred > message, rather than UB trying to relocate the environment (and auxv? > how? they can't tell libc they moved it) to some other location. Could that be a more portable way of doing this? Best, Wolfgang
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
Christophe Pettus
Дата:
> On Mar 17, 2024, at 06:11, Wolfgang Walther <walther@technowledgy.de> wrote: > The missing macro is on purpose and unlikely to change: https://openwall.com/lists/musl/2013/03/29/13 Indeed. > I also found this thread, which discusses exactly our case: https://www.openwall.com/lists/musl/2022/08/17/1 While getting proper setproctitle functionality on musl would be great, my goal was more modest: Have it correctly set PS_USE_NONEwhen compiling against musl.
On Sun, Mar 17, 2024 at 11:45 AM Christophe Pettus <xof@thebuild.com> wrote:
> On Mar 17, 2024, at 06:11, Wolfgang Walther <walther@technowledgy.de> wrote:
> The missing macro is on purpose and unlikely to change: https://openwall.com/lists/musl/2013/03/29/13
Indeed.
That seems a little shortsighted. If other libc implementations find it appropriate to have similar macros why should they be different?
> I also found this thread, which discusses exactly our case: https://www.openwall.com/lists/musl/2022/08/17/1
While getting proper setproctitle functionality on musl would be great, my goal was more modest: Have it correctly set PS_USE_NONE when compiling against musl.
One simple thing might be for us to enclose the block in ps_status.c at lines 49-59 in #ifndef PS_USE_NONE/#endif. Then you could compile with -DPS_USE_NONE in your CPPFLAGS.
cheers
andrew
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
Christophe Pettus
Дата:
> On Mar 17, 2024, at 13:33, Andrew Dunstan <andrew@dunslane.net> wrote: > > That seems a little shortsighted. If other libc implementations find it appropriate to have similar macros why should theybe different? It's a philosophical argument against checking for particular libc implementations instead of particular features. I'm notunsympathetic to that argument, but AFAICT there's no clean way of checking for this by examining feature #defines.
On Mon, Mar 18, 2024 at 10:06 AM Christophe Pettus <xof@thebuild.com> wrote: > > On Mar 17, 2024, at 13:33, Andrew Dunstan <andrew@dunslane.net> wrote: > > That seems a little shortsighted. If other libc implementations find it appropriate to have similar macros why shouldthey be different? > > It's a philosophical argument against checking for particular libc implementations instead of particular features. I'mnot unsympathetic to that argument, but AFAICT there's no clean way of checking for this by examining feature #defines. I like their philosophy, and I like their FAQ. Down with software monocultures, up with standards and cooperation. But anyway... I wondered for a moment if there could be a runtime way to test if we'd broken stuff, but it seems hard without a way to ask the runtime linker for its search path to see if it has any pointers into the environment. We can't, that "env_path" variable in dynlink.c is not accessible to us by any reasonable means. And yeah, this whole thing is a nasty invasive hack that harks back to the 80s I assume, before many systems provided a clean way to do this (and some never did)... Hmm, I can think of one dirty hack on top of our existing dirty hack that might work. I feel bad typing this out, but here goes nothing: In save_ps_display_args(), we compute end_of_area, stepping past contiguous arguments and environment variables. But what if we terminated that if we saw an environment entry beginning "LD_"? We'd still feel free to clobber the memory up to that point (rather than limiting ourselves to the argv space, another more conservative choice that might truncate a few PS display messages, or maybe not given the typical postmaster arguments, maye that'd work out OK), and we'd still copy the environment to somewhere new, but anything like "LD_XXX" that the runtime linker might have stashed a pointer to would remain valid. /me runs away and hides
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
Christophe Pettus
Дата:
> On Mar 17, 2024, at 16:20, Thomas Munro <thomas.munro@gmail.com> wrote: > > We'd > still feel free to clobber the memory up to that point (rather than > limiting ourselves to the argv space, another more conservative choice > that might truncate a few PS display messages, or maybe not given the > typical postmaster arguments, maye that'd work out OK), and we'd still > copy the environment to somewhere new, but anything like "LD_XXX" that > the runtime linker might have stashed a pointer to would remain valid. > /me runs away and hides It doesn't lack for bravery! (And I have to just comment that the linker storing pointers into that space as a way of findinglibraries... well, that doesn't get them the moral high ground for nasty hacks.) I'm comfortable with "if you are using musl, you don't get the ps messages" as a first solution, if we can find a way ofdetecting a libc that passes the other tests but doesn't support any of the existing hacks.
On Mon, Mar 18, 2024 at 10:34 PM Christophe Pettus <xof@thebuild.com> wrote: > > On Mar 17, 2024, at 16:20, Thomas Munro <thomas.munro@gmail.com> wrote: > > We'd > > still feel free to clobber the memory up to that point (rather than > > limiting ourselves to the argv space, another more conservative choice > > that might truncate a few PS display messages, or maybe not given the > > typical postmaster arguments, maye that'd work out OK), and we'd still > > copy the environment to somewhere new, but anything like "LD_XXX" that > > the runtime linker might have stashed a pointer to would remain valid. > > /me runs away and hides > > It doesn't lack for bravery! (And I have to just comment that the linker storing pointers into that space as a way offinding libraries... well, that doesn't get them the moral high ground for nasty hacks.) FWIW here is a blind patch if someone wants to try it out... no musl here. (Hmm, I think it's not that unreasonable on their part to assume the initial environment is immutable if their implementation doesn't mutate it, and our doing so is undeniably UB; surprising, maybe, given that the technique works on that other popular brand of C library on that kind of kernel, not to mention dozens of old Unixen of yore... The real solution may turn out to be the prctl() described in that thread, where you can tell the kernel where you're planning to move your argv and it can find it to show ps/top, but I checked and you still can't call that without special privileges, so maybe someone should get onto complaining to kernel hackers about that? That thread is wrong about us clobbering auxv BTW, we're not animals!)
Вложения
Thomas Munro <thomas.munro@gmail.com> writes: > (Hmm, I think it's not that unreasonable on their part to assume the > initial environment is immutable if their implementation doesn't > mutate it, and our doing so is undeniably UB; surprising, maybe, given > that the technique works on that other popular brand of C library on > that kind of kernel, not to mention dozens of old Unixen of yore... Does their implementation also ignore the effects of putenv() or setenv() on LD_LIBRARY_PATH? They have no moral high ground whatsoever if that's the case. But if it doesn't, an alternative route to a solution could be to scan the original environment, strdup and putenv each entry to move it to freshly malloc'd space, and then reclaim the old environment area. regards, tom lane
On Tue, Mar 19, 2024 at 3:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > (Hmm, I think it's not that unreasonable on their part to assume the > > initial environment is immutable if their implementation doesn't > > mutate it, and our doing so is undeniably UB; surprising, maybe, given > > that the technique works on that other popular brand of C library on > > that kind of kernel, not to mention dozens of old Unixen of yore... > > Does their implementation also ignore the effects of putenv() or > setenv() on LD_LIBRARY_PATH? They have no moral high ground > whatsoever if that's the case. But if it doesn't, an alternative > route to a solution could be to scan the original environment, strdup > and putenv each entry to move it to freshly malloc'd space, and > then reclaim the old environment area. Yes, the musl linker/loader ignores putenv()/setenv() changes to LD_LIBRARY_PATH after process start (that is, changes only effect the search path when injected into a new program with exec*()). As does glibc, it's just that it captures by copy instead of reference (according to one of the links above, I didn't check the source). So setenv() has no effect on dlopen() in *this* program, and using putenv in that way won't help. We simply can't move the value of LD_LIBRARY_PATH (though my patch could be a little sneakier and steal all the bytes right up to the = sign to get more space for our message!). One way to tell if a copy has been made is to trace a program that does: getenv("LD_LIBRARY_PATH")[2] = 'X'; dlopen("foo.so", RTLD_NOW | RTLD_GLOBAL); ... when run with LD_LIBRARY_PATH set to /asdf. On FreeBSD I see it tries to open "/aXdf...", so now I know that FreeBSD also captures it by reference like musl. But we don't use the clobber trick on FreeBSD, it has a proper setproctitle() function that knows how to negotiate with the kernel, so it doesn't matter. It also ignores changes made with setent()/putenv(), because those create fresh entries but leave the initial environment strings untouched. Solaris also ignores changes made after startup (it's in the dlopen man page), and from a very quick look at its ld_lib_setup() I think it achieved that with a copy. I believe its ancestor SunOS 4 invented all of these conventions (and the mmap/virtual memory concepts they rode in on), later nailed down to some degree in the System V ABI and very widely adopted, but I don't see anything in the latter that specifically addresses this point, eg LD_LIBRARY copy vs reference and interaction with dlopen() (perhaps I didn't look hard enough). I'm not sure what else you can point to to make strong claims about this stuff, but I bet every system ignores changes after startup, it's just that they found two ways to achieve that. POSIX says of dlopen that the "file [argument] is used in an implementation-defined manner", and of environ that we're welcome to swap a whole new environ, but doesn't seem to tell us anything about the one that is replaced (who owns it? is the initial one set up at execution time special? etc). The line banning manipulation of the pointers environ refers to doesn't exactly describe what we're doing (we're manipulating the strings pointed to by the *previous* environ). UB.
On Tue, Mar 19, 2024 at 10:17 AM Thomas Munro <thomas.munro@gmail.com> wrote: > ... (though my patch could be a little sneakier and steal > all the bytes right up to the = sign to get more space for our > message!). Here's one like that. No musl here -- does this work Wolfgang? Do we think it's generous enough with space in practice that we could just always do this for __linux__ systems without anyone noticing (ie including glibc users)? Should we be more specific about which LD_* variables? Do people not doing hacking/testing ever really set those, eg on production servers? This code path was once used by up to a dozen or so OSes but they're all dead, only Linux, Solaris and macOS left, and I don't have any reason to think they suffer from this problem and Macs don't even follow the SysV LD_ naming convention, hence gating on Linux.
Вложения
On Tue, Mar 19, 2024 at 11:48:50AM +1300, Thomas Munro wrote: > On Tue, Mar 19, 2024 at 10:17 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > ... (though my patch could be a little sneakier and steal > > all the bytes right up to the = sign to get more space for our > > message!). > > Here's one like that. No musl here -- does this work Wolfgang? Do we > think it's generous enough with space in practice that we could just > always do this for __linux__ systems without anyone noticing (ie > including glibc users)? Should we be more specific about which LD_* > variables? Do people not doing hacking/testing ever really set those, > eg on production servers? This code path was once used by up to a > dozen or so OSes but they're all dead, only Linux, Solaris and macOS > left, and I don't have any reason to think they suffer from this > problem and Macs don't even follow the SysV LD_ naming convention, > hence gating on Linux. So this would truncate the process title on all Linux that have an LD_ environment entry, even those without musl? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Wed, Mar 20, 2024 at 12:54 PM Bruce Momjian <bruce@momjian.us> wrote: > So this would truncate the process title on all Linux that have an LD_ > environment entry, even those without musl? Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a postmaster near you? You'd always get that much, plus as much of /proc/XXX/environ as we can find before you reach LD_XXX=, which on a typical system would, I guess, usually be never. If it's a problem you could try to arrange for LD_ XXX to come later in environ[]. What I observe is that they seem to get copied in backwards, wrt the environment exported by the parent, so if you set DUMMY=XXXXXXXX just before starting the process it'll make sacrificial space in the right place (but I'm not sure where that effect is coming from so don't quote me).
On Wed, Mar 20, 2024 at 02:12:54PM +1300, Thomas Munro wrote: > On Wed, Mar 20, 2024 at 12:54 PM Bruce Momjian <bruce@momjian.us> wrote: > > So this would truncate the process title on all Linux that have an LD_ > > environment entry, even those without musl? > > Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a > postmaster near you? You'd always get that much, plus as much of $ cat /proc/2000/cmdline |wc -c 30 > /proc/XXX/environ as we can find before you reach LD_XXX=, which on a > typical system would, I guess, usually be never. If it's a problem > you could try to arrange for LD_ XXX to come later in environ[]. What > I observe is that they seem to get copied in backwards, wrt the > environment exported by the parent, so if you set DUMMY=XXXXXXXX just > before starting the process it'll make sacrificial space in the right > place (but I'm not sure where that effect is coming from so don't > quote me). I am just cautious about changing behavior on our most common platform for a libc library I have never heard of. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Thomas Munro <thomas.munro@gmail.com> writes: > Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a > postmaster near you? You'd always get that much, plus as much of > /proc/XXX/environ as we can find before you reach LD_XXX=, which on a > typical system would, I guess, usually be never. I'd be happier about this if the target pattern were more restrictive. Is there reason to think that musl keeps a pointer to anything besides LD_LIBRARY_PATH? regards, tom lane
On Wed, Mar 20, 2024 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Yep. How long is /proc/XXX/cmdline (check with wc -c /proc/...) in a > > postmaster near you? You'd always get that much, plus as much of > > /proc/XXX/environ as we can find before you reach LD_XXX=, which on a > > typical system would, I guess, usually be never. > > I'd be happier about this if the target pattern were more restrictive. > Is there reason to think that musl keeps a pointer to anything besides > LD_LIBRARY_PATH? Also LD_PRELOAD: https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1824 Yeah we could do just those two.
Thomas Munro <thomas.munro@gmail.com> writes: > On Wed, Mar 20, 2024 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'd be happier about this if the target pattern were more restrictive. >> Is there reason to think that musl keeps a pointer to anything besides >> LD_LIBRARY_PATH? > Also LD_PRELOAD: > https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/ldso/dynlink.c#L1824 > Yeah we could do just those two. +1 for stopping only at one of those two names. regards, tom lane
On Wed, Mar 20, 2024 at 2:27 PM Bruce Momjian <bruce@momjian.us> wrote: > I am just cautious about changing behavior on our most common platform > for a libc library I have never heard of. Yeah, I hear you. I don't have a dog in this race, I just like retro-computing mysteries and arguments about the meaning of standards... That said I'm pretty sure no one should be running production PostgreSQL systems held together by LD_LIBRARY_PATH, and if they do, it looks like systemd/rc.d scripts set a bunch of PG_OOM_BLAH stuff just before the start the cluster that would provide extra chaff in front of LD_XXX stuff defined earlier, and then pg_ctl inserts even more, and even if they don't use any of that stuff and just run the postmaster directly with some other homegrown tooling, now we're down to very niche/expert scenarios where it should be acceptable to point to this thread that says "try setting an extra dummy variable after you set LD_XXX!".
On Wed, Mar 20, 2024 at 3:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for stopping only at one of those two names. Here's one like that for Wolfgang to test on musl.
Вложения
On 17.03.24 11:33, Christophe Pettus wrote: >> On Mar 17, 2024, at 02:44, Wolfgang Walther <walther@technowledgy.de> wrote: >> >> Nailed it. PS_USE_NONE fixes it. > > Given the musl (still?) does not define a preprocessor macro specific to it, is there a way of improving the test in pg_status.cto catch this case? It seems wrong that the current test passes a case that doesn't actually work. > We could turn it around and do #if defined(__linux__) #if defined(__GLIBC__) || defined(__UCLIBC__ ) #define PS_USE_CLOBBER_ARGV #else #define PS_USE_NONE #endif #endif
On Wed, Mar 20, 2024 at 2:03 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 17.03.24 11:33, Christophe Pettus wrote:
>> On Mar 17, 2024, at 02:44, Wolfgang Walther <walther@technowledgy.de> wrote:
>>
>> Nailed it. PS_USE_NONE fixes it.
>
> Given the musl (still?) does not define a preprocessor macro specific to it, is there a way of improving the test in pg_status.c to catch this case? It seems wrong that the current test passes a case that doesn't actually work.
>
We could turn it around and do
#if defined(__linux__)
#if defined(__GLIBC__) || defined(__UCLIBC__ )
#define PS_USE_CLOBBER_ARGV
#else
#define PS_USE_NONE
#endif
#endif
I like it. Neat and minimal.
cheers
andrew
Thomas Munro: > On Wed, Mar 20, 2024 at 3:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> +1 for stopping only at one of those two names. > > Here's one like that for Wolfgang to test on musl. Works fine. Peter Eisentraut: > We could turn it around and do > > #if defined(__linux__) > #if defined(__GLIBC__) || defined(__UCLIBC__ ) > #define PS_USE_CLOBBER_ARGV > #else > #define PS_USE_NONE > #endif > #endif This works as well. I also put together a PoC of what was mentioned in musl's mailing list: Instead of clobbering environ at all, exec yourself again with padded argv0. This works, too. Attached. Best, Wolfgang
Вложения
On Wed, Mar 20, 2024 at 10:39:20AM +0100, Wolfgang Walther wrote: > Peter Eisentraut: > > We could turn it around and do > > > > #if defined(__linux__) > > #if defined(__GLIBC__) || defined(__UCLIBC__ ) > > #define PS_USE_CLOBBER_ARGV > > #else > > #define PS_USE_NONE > > #endif > > #endif > > This works as well. Yes, I prefer this. I am worried the environ hackery will bite us someday and the cause will be hard to find. > I also put together a PoC of what was mentioned in musl's mailing list: > Instead of clobbering environ at all, exec yourself again with padded argv0. > This works, too. Attached. It is hard to imagine why we would add an extra exec on every Linux server start for this. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Wed, Mar 20, 2024 at 09:35:51AM -0400, Bruce Momjian wrote: > On Wed, Mar 20, 2024 at 10:39:20AM +0100, Wolfgang Walther wrote: > > I also put together a PoC of what was mentioned in musl's mailing list: > > Instead of clobbering environ at all, exec yourself again with padded argv0. > > This works, too. Attached. > > It is hard to imagine why we would add an extra exec on every Linux > server start for this. I guess we could conditionally exec only if we find we must, but then such exec cases would be rare and rarely tested. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Bruce Momjian: > I guess we could conditionally exec only if we find we must, but then > such exec cases would be rare and rarely tested. I think you might be seriously underestimating how often musl is used. Alpine Linux uses musl and is very widespread in the container world because of smaller image size. The library/postgres docker image has been pulled about 8 billion times since 2014 [1]. While we can't really tell how many of those pulled the alpine variant of the image, comparing the alpine [2] and ubuntu/debian [3,4] base images gives a rough estimate of >50% using alpine in general. This is certainly not rare. But yeah, buildfarm coverage for musl would be good, I agree. Maybe even directly in CI? Best, Wolfgang [1]: https://hub.docker.com/v2/repositories/library/postgres [2]: https://hub.docker.com/v2/repositories/library/alpine [3]: https://hub.docker.com/v2/repositories/library/ubuntu [4]: https://hub.docker.com/v2/repositories/library/debian
On Wed, 2024-03-20 at 15:24 +0100, Wolfgang Walther wrote: > I think you might be seriously underestimating how often musl is used. > Alpine Linux uses musl and is very widespread in the container world > because of smaller image size The last time I looked, its collation support didn't work at all... Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Wed, 2024-03-20 at 15:24 +0100, Wolfgang Walther wrote: >> I think you might be seriously underestimating how often musl is used. >> Alpine Linux uses musl and is very widespread in the container world >> because of smaller image size > The last time I looked, its collation support didn't work at all... I think the same is true of some of the BSDen, so that's not a large impediment to us. But in any case, if somebody wants Alpine or musl to be considered a supported platform, they'd best step up and run a buildfarm animal. regards, tom lane
Bruce Momjian: > On Wed, Mar 20, 2024 at 10:39:20AM +0100, Wolfgang Walther wrote: >> Peter Eisentraut: >>> We could turn it around and do >>> >>> #if defined(__linux__) >>> #if defined(__GLIBC__) || defined(__UCLIBC__ ) >>> #define PS_USE_CLOBBER_ARGV >>> #else >>> #define PS_USE_NONE >>> #endif >>> #endif >> >> This works as well. > > Yes, I prefer this. I am worried the environ hackery will bite us > someday and the cause will be hard to find. Well, the environ hackery already bit and it sure was hard to find. But this approach would still clobber environ happily... which is undefined behavior. But certainly the opt-in to known-to-be-good libc variants is a better approach than before. Between this and "stop clobbering at LD_LIBRARY_PATH", I prefer the latter, though. >> I also put together a PoC of what was mentioned in musl's mailing list: >> Instead of clobbering environ at all, exec yourself again with padded argv0. >> This works, too. Attached. > > It is hard to imagine why we would add an extra exec on every Linux > server start for this. Would this be a problem? For a running server this would happen only once when the postmaster starts up, AFAICT. Best, Wolfgang
Laurenz Albe: > On Wed, 2024-03-20 at 15:24 +0100, Wolfgang Walther wrote: >> I think you might be seriously underestimating how often musl is used. >> Alpine Linux uses musl and is very widespread in the container world >> because of smaller image size > > The last time I looked, its collation support didn't work at all... IIUC, using icu collations should work. I didn't extensively try, though. But yeah - musl itself doesn't do it, knowingly so. Best, Wolfgang
Tom Lane: > But in any case, if somebody wants > Alpine or musl to be considered a supported platform, they'd > best step up and run a buildfarm animal. Yeah, I was already thinking about that. But I guess we'd need to first make the test suite pass on musl. i.e. $subject, but there are also some smaller issues after that, before the full test suite will pass. Best, Wolfgang
On 20.03.24 15:37, Wolfgang Walther wrote: >> It is hard to imagine why we would add an extra exec on every Linux >> server start for this. > > Would this be a problem? For a running server this would happen only > once when the postmaster starts up, AFAICT. I wonder if it would cause issues with systemd or similar, if the PID of the running process is not the one that systemd started. If so, there is probably a workaround, but it would have to be analyzed.
Peter Eisentraut: >> Would this be a problem? For a running server this would happen only >> once when the postmaster starts up, AFAICT. > > I wonder if it would cause issues with systemd or similar, if the PID of > the running process is not the one that systemd started. If so, there > is probably a workaround, but it would have to be analyzed. I don't think that exec even creates a new PID. The current process is replaced, so the PID stays the same. Best, Wolfgang
On Wed, Mar 20, 2024 at 03:24:58PM +0100, Wolfgang Walther wrote: > Bruce Momjian: > > I guess we could conditionally exec only if we find we must, but then > > such exec cases would be rare and rarely tested. > > I think you might be seriously underestimating how often musl is used. > Alpine Linux uses musl and is very widespread in the container world because > of smaller image size. > > The library/postgres docker image has been pulled about 8 billion times > since 2014 [1]. While we can't really tell how many of those pulled the > alpine variant of the image, comparing the alpine [2] and ubuntu/debian > [3,4] base images gives a rough estimate of >50% using alpine in general. Uh, what is the current behavior of Postgres on musl? It just fails if the process title is longer than argv[0] plus the environment space to the LD_ environment variable, and then linking fails for certain extensions? If there are many downloads, why would we only be getting this report now? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Bruce Momjian: >> The library/postgres docker image has been pulled about 8 billion times >> since 2014 [1]. While we can't really tell how many of those pulled the >> alpine variant of the image, comparing the alpine [2] and ubuntu/debian >> [3,4] base images gives a rough estimate of >50% using alpine in general. > > Uh, what is the current behavior of Postgres on musl? It just fails if > the process title is longer than argv[0] plus the environment space to > the LD_ environment variable, and then linking fails for certain > extensions? If there are many downloads, why would we only be getting > this report now? The process title works fine. It's just the way how space is cleared for the process title, that is causing problems elsewhere. The thing that is broken when running postgres on alpine/musl is, to put libpq in a custom path and use LD_LIBRARY_PATH to find it when loading libpqwalreceiver (+ some contrib modules). Nobody does that, especially not in a container environment where postgres is likely the only thing running in that container, so there is no point in using any custom library paths or anything - the image is built once and made to work, and everybody else is just using that working image. The much more practical problem is that the test suite doesn't run, because it makes use of LD_LIBRARY_PATH for that purpose. In the past, the packagers for alpine only disabled the failing tests, but IIRC they have given up on that and just disabled the whole test suite by now. Best, Wolfgang
On Wed, Mar 20, 2024 at 08:29:21PM +0100, Wolfgang Walther wrote: > Bruce Momjian: > > > The library/postgres docker image has been pulled about 8 billion times > > > since 2014 [1]. While we can't really tell how many of those pulled the > > > alpine variant of the image, comparing the alpine [2] and ubuntu/debian > > > [3,4] base images gives a rough estimate of >50% using alpine in general. > > > > Uh, what is the current behavior of Postgres on musl? It just fails if > > the process title is longer than argv[0] plus the environment space to > > the LD_ environment variable, and then linking fails for certain > > extensions? If there are many downloads, why would we only be getting > > this report now? > > The process title works fine. It's just the way how space is cleared for the > process title, that is causing problems elsewhere. > > The thing that is broken when running postgres on alpine/musl is, to put > libpq in a custom path and use LD_LIBRARY_PATH to find it when loading > libpqwalreceiver (+ some contrib modules). Nobody does that, especially not > in a container environment where postgres is likely the only thing running > in that container, so there is no point in using any custom library paths or > anything - the image is built once and made to work, and everybody else is > just using that working image. > > The much more practical problem is that the test suite doesn't run, because > it makes use of LD_LIBRARY_PATH for that purpose. In the past, the packagers > for alpine only disabled the failing tests, but IIRC they have given up on > that and just disabled the whole test suite by now. Thanks, that is very clear. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Thu, Mar 21, 2024 at 2:35 AM Bruce Momjian <bruce@momjian.us> wrote: > Yes, I prefer this. I am worried the environ hackery will bite us > someday and the cause will be hard to find. Some speculation on how widespread this trick is: as mentioned, it seems to come from sendmail (I didn't spend the time to find a repo that has ancient versions, so this is someone's random snapshot repo, but it's referring to pretty old dead systems): https://github.com/Distrotech/sendmail/blob/master/sendmail/conf.c#L2436 That was once ubiquitous, back in the day. One of the most widespread envon-clobberers these days must be openssh: https://github.com/openssh/openssh-portable/blob/86bdd3853f4d32c85e295e6216a2fe0953ad93f0/openbsd-compat/setproctitle.c#L69 And funnily enough, googling LD_LIBRARY_PATH and openssh brings up a few unsolved/unanswered questions about mysterious breakage (though I didn't see any that mentioned musl by name and there could be other explanations, *shrug*). There is also Chromium/Chrome: https://github.com/chromium/chromium/blob/main/base/process/set_process_title_linux.cc#L136 That code has some interesting commentary and points to a commit in Linux which mentions setproctitle() and making sure it still works (funny because setproctitle() isn't a function in any standard userspace library AFAIK so I guess it just means the trick in general), and also mentions the failure of attempts to get an official way to do this negotiated between the relevant projects. Of course we have to distinguish between the basic argv[] clobbering trick which is barely even a trick, and the more advanced environ stealing trick which confuses musl. A very widespread user of the basic trick would be systemd, which tries to use the prctl() if it can to get a much bigger window of memory to write on, but otherwise falls back to accepting a small one. I guess we'd do the same if we could, ie if a future Linux version didn't require CAP_SYS_RESOURCES to do it: https://github.com/systemd/systemd/blob/8810b782a17050d7f7a870b975f09e8a690b7bea/src/basic/argv-util.c Anyway, it looks like there is plenty of will out there to keep this working, albeit in a weird semi-supported state whose cruftiness is undeniable.
Thomas Munro: > Of course we have to distinguish between the basic argv[] clobbering > trick which is barely even a trick, and the more advanced environ > stealing trick which confuses musl. Right. The latter not only confuses musl, but also makes /proc/<pid>/environ return garbage. This is also mentioned at the bottom of main.c, which has a workaround for the specific case of UBSan depending on that. This is kind of funny: Because we are relying on undefined behavior regarding the modification of environ, we need a workaround for the "UndefinedBehaviorSanitizer" - I guess by failing without this workaround, it wanted to tell us something.. This happens on glibc, too. So summarizing: 1. The simple approach is to use PS_USE_CLOBBER_ARGV on Linux only for glibc and other known-to-be-good-and-identifiable libc variants, otherwise default to PS_USE_NONE. This will not only keep the problem for /proc/../environ for glibc users, but also disable ps status for musl entirely. Considering that probably the biggest use-case for musl is to run postgres in containers, it's quite likely to actually run more than just one cluster on a single machine. In this case... ps status would be especially handy to identify which cluster a process belongs to. 2. The next proposal was to stop clobbering environ once LD_LIBRARY_PATH / LD_PRELOAD is found to keep those intact. This will keep ps status support on musl, which is good. But the /proc/.../environ problem will still be there, unchanged. Both of those approaches rely on the undefined behavior of clobbering environ. 3. The logical consequence of this is, to stop clobbering environ and use only the available argv space. However, this will quickly leave us with a very small ps status buffer to work with, making the feature less useful. Note, that this could happen theoretically by starting postgres with the fewest arguments and environment possible, too. Not sure what the minimal buffer size is that could be achieved that way. The point is: The buffer size is not guaranteed at all. 4. The upstream (musl) suggestion of which I sent a PoC was to "exec yourself with a bigger argv". This works. I chose to pad argv0 with trailing slashes. Those can safely be stripped away again, because any argv0 which would come with a trailing slash to start with, would not be the current executable, but a directory - so would fail exec immediately anyway. This keeps /proc/.../environ intact and does not rely on undefined behavior. Additionally, we get a guaranteed ps buffer size of 256, which is what we use on BSDs and Windows, too. I wonder why we actually fall back to PS_USE_NONE by default.. and how much of that is related to the environment clobbering to start with? Could we even use the exec-approach as the fallback in all other cases except BSDs and Windows and get rid of PS_USE_NONE? Clobbering only argv sure seems way safer to do than what we do right now. Best, Wolfgang
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
walther@technowledgy.de
Дата:
Here is what we could with this: > 2. The next proposal was to stop clobbering environ once LD_LIBRARY_PATH > / LD_PRELOAD is found to keep those intact. We could backpatch this down to v12. This would be one step to make the test suite pass on Alpine Linux with musl and ultimately allow setting up a buildfarm animal for that. It does not solve the /proc/.../environ problem, but at least keeps ps status working on musl as it did before, so not a regression. > 4. The upstream (musl) suggestion of which I sent a PoC was to "exec > yourself with a bigger argv". We could do this in HEAD now ... > Could we even use the exec-approach as the fallback in all other cases > except BSDs and Windows and get rid of PS_USE_NONE? ... and then remove PS_USE_NONE at the beginning of the v18 cycle. This would give a bit more time for those "other systems", which were previously falling back PS_USE_NONE and would then clobber argv, too. Opinions? Best, Wolfgang
On Fri, Mar 22, 2024 at 9:30 AM <walther@technowledgy.de> wrote: > > 4. The upstream (musl) suggestion of which I sent a PoC was to "exec > > yourself with a bigger argv". > > We could do this in HEAD now ... Just a thought: if we want to go this way, do we need a new exec call? We already control the initial exec in pg_ctl.c. > > Could we even use the exec-approach as the fallback in all other cases > > except BSDs and Windows and get rid of PS_USE_NONE? > > ... and then remove PS_USE_NONE at the beginning of the v18 cycle. > > This would give a bit more time for those "other systems", which were > previously falling back PS_USE_NONE and would then clobber argv, too. RIght. It's unspecified by POSIX whether ps shows changes to those strings (and there are systems that don't), but it can't hurt to do so anyway, and it'd be better than having a PS_USE_NONE code path that is untested. I dimly recall that it turned out that PS_USE_NONE was actually broken for a while without anyone noticing.
Thomas Munro <thomas.munro@gmail.com> writes: > Just a thought: if we want to go this way, do we need a new exec call? > We already control the initial exec in pg_ctl.c. I'm resistant to assuming the postmaster is launched through pg_ctl. systemd, for example, might well prefer not to do that, not to mention all the troglodytes still using 1990s launch scripts. A question that seems worth debating in this thread is how much updating the process title is even worth nowadays. It feels like a hangover from before we had pg_stat_activity and other monitoring support. So I don't feel a huge need to support it on musl. The previously-suggested patch to whitelist glibc and variants, and otherwise fall back to PS_USE_NONE, seems like it might be the appropriate amount of effort. regards, tom lane
On Fri, Mar 22, 2024 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The previously-suggested patch to whitelist glibc and variants, > and otherwise fall back to PS_USE_NONE, seems like it might be > the appropriate amount of effort. What about meeting musl halfway: clobber argv, but only clobber environ for the libcs known to tolerate that? Then musl might see truncation at 30-60 characters or whatever it is, but that's probably enough to see your cluster_name and backend type/user name which is pretty useful information.
On Thu, Mar 21, 2024 at 7:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
> Just a thought: if we want to go this way, do we need a new exec call?
> We already control the initial exec in pg_ctl.c.
I'm resistant to assuming the postmaster is launched through pg_ctl.
systemd, for example, might well prefer not to do that, not to
mention all the troglodytes still using 1990s launch scripts.
A question that seems worth debating in this thread is how much
updating the process title is even worth nowadays. It feels like
a hangover from before we had pg_stat_activity and other monitoring
support. So I don't feel a huge need to support it on musl.
The previously-suggested patch to whitelist glibc and variants,
and otherwise fall back to PS_USE_NONE, seems like it might be
the appropriate amount of effort.
+1
cheers
andrew
Thomas Munro <thomas.munro@gmail.com> writes: > On Fri, Mar 22, 2024 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The previously-suggested patch to whitelist glibc and variants, >> and otherwise fall back to PS_USE_NONE, seems like it might be >> the appropriate amount of effort. > What about meeting musl halfway: clobber argv, but only clobber > environ for the libcs known to tolerate that? Then musl might see > truncation at 30-60 characters or whatever it is, but that's probably > enough to see your cluster_name and backend type/user name which is > pretty useful information. No real objection here. I do wonder about the point you (or somebody) made upthread that we don't have any testing of the PS_USE_NONE case; but that could be addressed some other way. regards, tom lane
On Fri, Mar 22, 2024 at 11:42:52AM +1300, Thomas Munro wrote: > On Fri, Mar 22, 2024 at 9:30 AM <walther@technowledgy.de> wrote: > > > 4. The upstream (musl) suggestion of which I sent a PoC was to "exec > > > yourself with a bigger argv". > > > > We could do this in HEAD now ... > > Just a thought: if we want to go this way, do we need a new exec call? > We already control the initial exec in pg_ctl.c. > > > > Could we even use the exec-approach as the fallback in all other cases > > > except BSDs and Windows and get rid of PS_USE_NONE? > > > > ... and then remove PS_USE_NONE at the beginning of the v18 cycle. > > > > This would give a bit more time for those "other systems", which were > > previously falling back PS_USE_NONE and would then clobber argv, too. > > RIght. It's unspecified by POSIX whether ps shows changes to those > strings (and there are systems that don't), but it can't hurt to do so > anyway, and it'd be better than having a PS_USE_NONE code path that is > untested. I dimly recall that it turned out that PS_USE_NONE was > actually broken for a while without anyone noticing. Actually, I was thinking the opposite. Since the musl libc is widely used, it will be tested, and I don't want to disable process display updates for such a common platform. I suggest we use the #ifdef test to continue our existing behavior for the libraries we know about, like glibc, and use the LD_* process title truncation hack for libc's we don't recognize. Attached is a prototype patch which implements this based on previous patches. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Вложения
On Fri, Mar 22, 2024 at 12:20:11PM +1300, Thomas Munro wrote: > On Fri, Mar 22, 2024 at 12:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The previously-suggested patch to whitelist glibc and variants, > > and otherwise fall back to PS_USE_NONE, seems like it might be > > the appropriate amount of effort. > > What about meeting musl halfway: clobber argv, but only clobber > environ for the libcs known to tolerate that? Then musl might see > truncation at 30-60 characters or whatever it is, but that's probably > enough to see your cluster_name and backend type/user name which is > pretty useful information. I just posted a prototype patch to implement this. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Hi, On 2024-03-21 21:16:46 +0100, Wolfgang Walther wrote: > Right. The latter not only confuses musl, but also makes /proc/<pid>/environ > return garbage. This is also mentioned at the bottom of main.c, which has a > workaround for the specific case of UBSan depending on that. This is kind of > funny: Because we are relying on undefined behavior regarding the > modification of environ, we need a workaround for the > "UndefinedBehaviorSanitizer" - I guess by failing without this workaround, > it wanted to tell us something.. I don't think that's quite a fair description. Ubsan is basically doing undefined things itself, so it's turtles all the way down. > So summarizing: FWIW, independent of which fix we go with, I think we need a buildfarm animal using musl. Even better if one of the CI tasks can be made to use musl as well. Greetings, Andres Freund
Sent from my iPad > On Mar 22, 2024, at 10:49 AM, Andres Freund <andres@anarazel.de> wrote: > > Hi, > >> On 2024-03-21 21:16:46 +0100, Wolfgang Walther wrote: >> Right. The latter not only confuses musl, but also makes /proc/<pid>/environ >> return garbage. This is also mentioned at the bottom of main.c, which has a >> workaround for the specific case of UBSan depending on that. This is kind of >> funny: Because we are relying on undefined behavior regarding the >> modification of environ, we need a workaround for the >> "UndefinedBehaviorSanitizer" - I guess by failing without this workaround, >> it wanted to tell us something.. > > I don't think that's quite a fair description. Ubsan is basically doing > undefined things itself, so it's turtles all the way down. > > >> So summarizing: > > FWIW, independent of which fix we go with, I think we need a buildfarm animal > using musl. Even better if one of the CI tasks can be made to use musl as > well. We had one till 3 months ago. It’s on my list to recreate. Cheers Andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On Mar 22, 2024, at 10:49 AM, Andres Freund <andres@anarazel.de> wrote: >> FWIW, independent of which fix we go with, I think we need a buildfarm animal >> using musl. Even better if one of the CI tasks can be made to use musl as >> well. > We had one till 3 months ago. It’s on my list to recreate. How was it passing? The issue discussed in this thread has surely been there for a long time, and Wolfgang mentioned that he sees others. regards, tom lane
> On Mar 22, 2024, at 4:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andrew Dunstan <andrew@dunslane.net> writes:age >>> On Mar 22, 2024, at 10:49 AM, Andres Freund <andres@anarazel.de> wrote: >>> FWIW, independent of which fix we go with, I think we need a buildfarm animal >>> using musl. Even better if one of the CI tasks can be made to use musl as >>> well. > >> We had one till 3 months ago. It’s on my list to recreate. > > How was it passing? The issue discussed in this thread has surely > been there for a long time, and Wolfgang mentioned that he sees > others. > > The buildfarm client has a switch that delays running regression tests until after the install stages. Cheers Andrew
Andres Freund: > FWIW, independent of which fix we go with, I think we need a buildfarm animal > using musl. Even better if one of the CI tasks can be made to use musl as > well. I am already working with Andrew to set up a buildfarm animal to run Alpine Linux/musl. I can look into the CI task as well. Are you suggesting to change an existing task to run with Alpine/musl or to add a new task for it? It would be docker image based for sure. Best, Wolfgang
Andrew Dunstan: >> On Mar 22, 2024, at 4:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> How was it passing? The issue discussed in this thread has surely >> been there for a long time, and Wolfgang mentioned that he sees >> others. > > The buildfarm client has a switch that delays running regression tests until after the install stages. Hm. So while that switch makes the animal pass the build, it did hide exactly this problem. Not sure whether this switch should be used at all, then. Was this switch only implemented for the specific case of Alpine/musl or is there a different reason for it, as well? The other issues I had been seeing were during make check-world, but not make check. Those were things around setlocale() / /bin/locale, IIRC. Not sure whether all of the tests are run by the buildfarm? Best, Wolfgang
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
walther@technowledgy.de
Дата:
Tom Lane: > Thomas Munro <thomas.munro@gmail.com> writes: >> Just a thought: if we want to go this way, do we need a new exec call? >> We already control the initial exec in pg_ctl.c. > > I'm resistant to assuming the postmaster is launched through pg_ctl. > systemd, for example, might well prefer not to do that, not to > mention all the troglodytes still using 1990s launch scripts. Right, the systemd example in the docs is not using pg_ctl. But, it should be easily possible to have: - pg_ctl call postgres with a padded argv0 - postgres call itself with padding, if it wasn't called with that already This way, there would be no additional exec call when started through pg_ctl, but one more call when started directly. Best, Wolfgang
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
walther@technowledgy.de
Дата:
Bruce Momjian: > I suggest we use the #ifdef test to continue our existing behavior for > the libraries we know about, like glibc, and use the LD_* process title > truncation hack for libc's we don't recognize. > > Attached is a prototype patch which implements this based on previous > patches. The condition to check for linux/glibc in your patch is slightly off: #if ! defined(__linux__) || (! defined(__GLIBC__) && defined(__UCLIBC__ )) should be #if defined(__linux__) && ! (defined(__GLIBC__) || defined(__UCLIBC__ )) With the latter, it passes tests with musl. Best, Wolfgang
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
walther@technowledgy.de
Дата:
Wolfgang Walther: > The other issues I had been seeing were during make check-world, but not > make check. Those were things around setlocale() / /bin/locale, IIRC. > Not sure whether all of the tests are run by the buildfarm? Ah, those other tests only fail when building --with-icu, but Andrew's animal didn't do that. Best, Wolfgang
On Fri, Mar 22, 2024 at 4:02 AM Wolfgang Walther <walther@technowledgy.de> wrote:
Andrew Dunstan:
>> On Mar 22, 2024, at 4:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How was it passing? The issue discussed in this thread has surely
>> been there for a long time, and Wolfgang mentioned that he sees
>> others.
>
> The buildfarm client has a switch that delays running regression tests until after the install stages.
Hm. So while that switch makes the animal pass the build, it did hide
exactly this problem. Not sure whether this switch should be used at
all, then. Was this switch only implemented for the specific case of
Alpine/musl or is there a different reason for it, as well?
Alpine was the main motivation, but it's also probably useful on Macs with SIP enabled.
ISTR raising the Alpine issue back then (2018) but I can't find a reference now.
cheers
andrew
On Fri, Mar 22, 2024 at 09:33:38AM +0100, walther@technowledgy.de wrote: > Bruce Momjian: > > I suggest we use the #ifdef test to continue our existing behavior for > > the libraries we know about, like glibc, and use the LD_* process title > > truncation hack for libc's we don't recognize. > > > > Attached is a prototype patch which implements this based on previous > > patches. > > The condition to check for linux/glibc in your patch is slightly off: > > #if ! defined(__linux__) || (! defined(__GLIBC__) && defined(__UCLIBC__ )) > > should be > > #if defined(__linux__) && ! (defined(__GLIBC__) || defined(__UCLIBC__ )) > > With the latter, it passes tests with musl. Yes, my logic was wrong. Not sure what I was thinking, frankly. I am not a big fan of negating a complex conditional, but would rather pass the negation into the conditional, new patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Вложения
On Fri, Mar 22, 2024 at 09:36:19AM -0400, Bruce Momjian wrote: > On Fri, Mar 22, 2024 at 09:33:38AM +0100, walther@technowledgy.de wrote: > > Bruce Momjian: > > > I suggest we use the #ifdef test to continue our existing behavior for > > > the libraries we know about, like glibc, and use the LD_* process title > > > truncation hack for libc's we don't recognize. > > > > > > Attached is a prototype patch which implements this based on previous > > > patches. > > > > The condition to check for linux/glibc in your patch is slightly off: > > > > #if ! defined(__linux__) || (! defined(__GLIBC__) && defined(__UCLIBC__ )) > > > > should be > > > > #if defined(__linux__) && ! (defined(__GLIBC__) || defined(__UCLIBC__ )) > > > > With the latter, it passes tests with musl. > > Yes, my logic was wrong. Not sure what I was thinking, frankly. > > I am not a big fan of negating a complex conditional, but would rather > pass the negation into the conditional, new patch attached. With no one "hoping this patch dies in a fire"*, I have updated it with more details, which I now think is committable to master. Is this something to backpatch? Seems too rare a bug to me. * Robert Haas, https://www.postgresql.org/message-id/CA%2BTgmoYsyrCNmg%2BYh6rgP7K8r-bYPjCeF1tPxENRFwD4VZAZvw%40mail.gmail.com -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Вложения
On Thu, Mar 21, 2024 at 11:26 AM Thomas Munro <thomas.munro@gmail.com> wrote: > Some speculation on how widespread this trick is: as mentioned, it Just one more, which I just ran into by accident while looking for something else, which I'm posting just in case this thread ever gets used to try to convince musl hackers to change their end to support it: https://github.com/openzfs/zfs/blob/master/lib/libzutil/os/linux/zutil_setproctitle.c#L158
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
walther@technowledgy.de
Дата:
Bruce Momjian: > With no one "hoping this patch dies in a fire"*, I have updated it with > more details, which I now think is committable to master. Is this > something to backpatch? Seems too rare a bug to me. I would like to see this backpatched to be able to run the regression tests easily on all stable branches. I have taken your's/Thomas' patch and extended it with a few more taking in many of the ideas in this thread: 0001 Don't clobber LD_* This is the patch you posted. This applies cleanly all the way down to v12. This fixes the bug and allows running most of the tests with musl - yeah! I also confirmed, that this will not create practical problems with library/postgres docker image, where this is likely used the most. While "postgres" is called by default without any arguments here, plenty of environment variables are passed. The docker image does use LD_PRELOAD to trick initdb, but that's not set when running the postmaster, so not a problem here. This use-case also shows why the proposed patch to still partially clobber environ at this stage is better than to not clobber environ at all - in this case, the docker image would essentially have no ps status at all by default.- 0002 Allow setting PS_USE_NONE via CPPFLAGS This was proposed by Andrew and applies cleanly down to v12. Thus, it could be backpatched, too. First and foremost this would allow setting a buildfarm animal to use this flag to make sure this code path is actually build/tested at all. This is something that Thomas and Tom hinted at. 0003 Don't ever clobber environ again This is the approach I had previously posted as a PoC. This would not be backpatched, but I suggested this could go into v17 now. This avoids the undefined behavior and sets the table to eventually set ps status via argv by default and remove PS_USE_NONE later. Compared to the PoC patch, I decided not to pad argv[0], because this will break the ps status display for the postmaster itself. Instead exec is called with an additional argument consisting of exactly 255 spaces. I also tried avoiding the additional exec-call if postgres was called via pg_ctl, as suggested by Peter. This quickly turned out to be more invasive than I would have liked this to be, though. The current approach works very well, the environment doesn't need to be copied anymore and the workaround for /proc/<pid>/environ in main.c can go away, too. 0004 Default to PS_USE_CLOBBER_ARGV This changes the default to display ps status on all other systems, too. This could potentially go in now as well, or be delayed to the beginning of the v18 cycle. In the unlikely event that this breaks something on a platform not considered here and we get a bug report, we can easily advise to compile with CPPFLAGS=-DPS_USE_NONE, which is still there at this stage. 0005 Remove PS_USE_NONE However, if no reports come in and no problems are detected with 0004, then this can be entirely removed. This for "later", whenever that is. Best, Wolfgang
Вложения
On Mon, Mar 25, 2024 at 10:58:30PM +0100, walther@technowledgy.de wrote: > Bruce Momjian: > > With no one "hoping this patch dies in a fire"*, I have updated it with > > more details, which I now think is committable to master. Is this > > something to backpatch? Seems too rare a bug to me. > > I would like to see this backpatched to be able to run the regression tests > easily on all stable branches. You want to risk destabilizing Postgres by backpatching something this complex so the regression tests can be run on all stable branches? I think you are overestimating our desire to take on risk. Also, in my patch, the parentheses here: #if defined(__linux__) && (! defined(__GLIBC__) && ! defined(__UCLIBC__)) are unnecessary so they should be removed: #if defined(__linux__) && ! defined(__GLIBC__) && ! defined(__UCLIBC__) I am only willing to apply my patch, and only to master. Other committers might be more willing. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On 22.03.24 20:44, Bruce Momjian wrote: > + * linking (dlopen) might fail. Here, we truncate the update > + * of the process title when either of two important dynamic > + * linking environment variables are set. Musl does not > + * define any compiler symbols, so we have to do this for > + * any Linux libc we don't know is safe. > + */ > + if (strstr(environ[i], "LD_LIBRARY_PATH=") == environ[i] || > + strstr(environ[i], "LD_PRELOAD=") == environ[i]) What determines which variables require this treatment?
On Mon, Mar 25, 2024 at 11:46:09PM +0100, Peter Eisentraut wrote: > On 22.03.24 20:44, Bruce Momjian wrote: > > + * linking (dlopen) might fail. Here, we truncate the update > > + * of the process title when either of two important dynamic > > + * linking environment variables are set. Musl does not > > + * define any compiler symbols, so we have to do this for > > + * any Linux libc we don't know is safe. > > + */ > > + if (strstr(environ[i], "LD_LIBRARY_PATH=") == environ[i] || > > + strstr(environ[i], "LD_PRELOAD=") == environ[i]) > > What determines which variables require this treatment? Thomas Munro came up with that part of the patch. I just combined his patch with the macro test. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Tue, Mar 26, 2024 at 11:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: > What determines which variables require this treatment? That came from me peeking at their code: https://www.postgresql.org/message-id/CA%2BhUKGKNK5V8XwwJJZm36s3EUy8V51xu4XiE8%3D26n%3DWq3OGd4A%40mail.gmail.com I had originally proposed to avoid anything beginning "LD_" but Tom suggested being more specific. I doubt LD_PRELOAD can really hurt you though (the linker probably only needs the value at the start by definition, not at later dlopen() time (?)). I dunno. If you're asking if there is any standard or whatever supplying these names, the System V or at least ELF standards talk about LD_LIBRARY_PATH (though those standards don't know/care what happens after userspace takes over control of the environment concept, they just talk about how the world is created when you exec a process, so they AFAICS they don't address this clobbering stuff, and AFAIK other LD_XXX stuff is probably implementation specific).
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Mar 25, 2024 at 10:58:30PM +0100, walther@technowledgy.de wrote: >> I would like to see this backpatched to be able to run the regression tests >> easily on all stable branches. > You want to risk destabilizing Postgres by backpatching something this > complex so the regression tests can be run on all stable branches? I > think you are overestimating our desire to take on risk. If we want a buildfarm animal testing this platform, we kind of need to support it on all branches. Having said that, I agree with you that we are looking for a minimalist fix not a maximalist one. I think the 0001 patch is about right, but the rest seem to be solving problems we don't have. regards, tom lane
Thomas Munro <thomas.munro@gmail.com> writes: > On Tue, Mar 26, 2024 at 11:46 AM Peter Eisentraut <peter@eisentraut.org> wrote: >> What determines which variables require this treatment? > I had originally proposed to avoid anything beginning "LD_" but Tom > suggested being more specific. I doubt LD_PRELOAD can really hurt you > though (the linker probably only needs the value at the start by > definition, not at later dlopen() time (?)). Oh, good point. So we could simplify the patch by only looking for LD_LIBRARY_PATH. regards, tom lane
On Mon, Mar 25, 2024 at 07:14:25PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Mon, Mar 25, 2024 at 10:58:30PM +0100, walther@technowledgy.de wrote: > >> I would like to see this backpatched to be able to run the regression tests > >> easily on all stable branches. > > > You want to risk destabilizing Postgres by backpatching something this > > complex so the regression tests can be run on all stable branches? I > > think you are overestimating our desire to take on risk. > > If we want a buildfarm animal testing this platform, we kind of need > to support it on all branches. Having said that, I agree with you > that we are looking for a minimalist fix not a maximalist one. > I think the 0001 patch is about right, but the rest seem to be solving > problems we don't have. I could support the minimalist patch applied to all branches. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
I wrote: > Thomas Munro <thomas.munro@gmail.com> writes: >> I had originally proposed to avoid anything beginning "LD_" but Tom >> suggested being more specific. I doubt LD_PRELOAD can really hurt you >> though (the linker probably only needs the value at the start by >> definition, not at later dlopen() time (?)). > Oh, good point. So we could simplify the patch by only looking for > LD_LIBRARY_PATH. I looked at the musl source code you identified and confirmed that only the LD_LIBRARY_PATH string is remembered in a static variable; LD_PRELOAD is only accessed locally in that initialization function. So we only need to do the attached. (I failed to resist the temptation to rewrite the comments.) regards, tom lane diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index 5d829e6e48..92cd2c7899 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -151,7 +151,33 @@ save_ps_display_args(int argc, char **argv) for (i = 0; environ[i] != NULL; i++) { if (end_of_area + 1 == environ[i]) - end_of_area = environ[i] + strlen(environ[i]); + { + /* + * The musl runtime linker keeps a static pointer to the + * initial value of LD_LIBRARY_PATH, if that is defined in the + * process's environment. Therefore, we must not overwrite the + * value of that setting and thus cannot advance end_of_area + * beyond it. Musl does not define any identifying compiler + * symbol, so we have to do this for any Linux libc we don't + * know is safe. + */ +#if defined(__linux__) && (!defined(__GLIBC__) && !defined(__UCLIBC__ )) + if (strncmp(environ[i], "LD_LIBRARY_PATH=", 16) == 0) + { + /* + * We can overwrite the name, but stop at the equals sign. + * Future loop iterations will not find any more + * contiguous space, but we don't break early because we + * need to count the total number of environ[] entries. + */ + end_of_area = environ[i] + 15; + } + else +#endif + { + end_of_area = environ[i] + strlen(environ[i]); + } + } } ps_buffer = argv[0];
On Tue, Mar 26, 2024 at 12:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I looked at the musl source code you identified and confirmed that > only the LD_LIBRARY_PATH string is remembered in a static variable; > LD_PRELOAD is only accessed locally in that initialization function. > So we only need to do the attached. (I failed to resist the > temptation to rewrite the comments.) LGTM.
Hi, On 2024-03-22 08:55:52 +0100, Wolfgang Walther wrote: > Andres Freund: > > FWIW, independent of which fix we go with, I think we need a buildfarm animal > > using musl. Even better if one of the CI tasks can be made to use musl as > > well. > > I am already working with Andrew to set up a buildfarm animal to run Alpine > Linux/musl. I can look into the CI task as well. Are you suggesting to > change an existing task to run with Alpine/musl or to add a new task for it? > It would be docker image based for sure. I'd rather adapt one of the existing tasks, to avoid increasing CI costs unduly. The way we currently run CI for testing of not-yet-merged patches runs all tasks other than macos as full VMs, that turned out to be faster & cheaper. FWIW, except for one small issue, building postgres against musl works on debian and the tests pass if I install first. The small problem mentioned above is that on debian linux/fs.h isn't available when building with musl, which in turn causes src/bin/pg_upgrade/file.c to fail to compile. I assume that's not the case on "fully musl" distro? Greetings, Andres Freund
On Tue, Mar 26, 2024 at 12:49:55PM +1300, Thomas Munro wrote: > On Tue, Mar 26, 2024 at 12:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I looked at the musl source code you identified and confirmed that > > only the LD_LIBRARY_PATH string is remembered in a static variable; > > LD_PRELOAD is only accessed locally in that initialization function. > > So we only need to do the attached. (I failed to resist the > > temptation to rewrite the comments.) > > LGTM. +1 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On 26.03.24 00:43, Tom Lane wrote: > I wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> I had originally proposed to avoid anything beginning "LD_" but Tom >>> suggested being more specific. I doubt LD_PRELOAD can really hurt you >>> though (the linker probably only needs the value at the start by >>> definition, not at later dlopen() time (?)). > >> Oh, good point. So we could simplify the patch by only looking for >> LD_LIBRARY_PATH. > > I looked at the musl source code you identified and confirmed that > only the LD_LIBRARY_PATH string is remembered in a static variable; > LD_PRELOAD is only accessed locally in that initialization function. > So we only need to do the attached. (I failed to resist the > temptation to rewrite the comments.) Yeah, I was more looking for a comment for posterity for *why* we need to preserve this variable in particular. The updated comment looks reasonable.
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
walther@technowledgy.de
Дата:
Bruce Momjian: > You want to risk destabilizing Postgres by backpatching something this > complex so the regression tests can be run on all stable branches? I > think you are overestimating our desire to take on risk. I specifically wrote about backpatching the first (and maybe second) patch. None of that is risky. Patches 3-5 were not meant for backpatching at all. Best, Wolfgang
Re: Regression tests fail with musl libc because libpq.so can't be loaded
От
walther@technowledgy.de
Дата:
Tom Lane: > If we want a buildfarm animal testing this platform, we kind of need > to support it on all branches. Having said that, I agree with you > that we are looking for a minimalist fix not a maximalist one. > I think the 0001 patch is about right, but the rest seem to be solving > problems we don't have. The second patch potentially solves the problem of PS_USE_NONE not being tested. Of course you could also set up a buildfarm animal on some other platform, which is sure to fall through to PS_USE_NONE, but that seems to have not worked in the past: Thomas Munro: > I dimly recall that it turned out that PS_USE_NONE was > actually broken for a while without anyone noticing Best, Wolfgang
walther@technowledgy.de writes: > The second patch potentially solves the problem of PS_USE_NONE not being > tested. Of course you could also set up a buildfarm animal on some other > platform, which is sure to fall through to PS_USE_NONE, but that seems > to have not worked in the past: > Thomas Munro: >> I dimly recall that it turned out that PS_USE_NONE was >> actually broken for a while without anyone noticing I think what Thomas is recollecting is this: Author: Tom Lane <tgl@sss.pgh.pa.us> Branch: master Release: REL_15_BR [0fb6954aa] 2022-03-27 12:57:46 -0400 Branch: REL_14_STABLE Release: REL_14_3 [3f7a59c59] 2022-03-27 12:57:52 -0400 Branch: REL_13_STABLE Release: REL_13_7 [9016a2a3d] 2022-03-27 12:57:57 -0400 Fix breakage of get_ps_display() in the PS_USE_NONE case. Commit 8c6d30f21 caused this function to fail to set *displen in the PS_USE_NONE code path. If the variable's previous value had been negative, that'd lead to a memory clobber at some call sites. We'd managed not to notice due to very thin test coverage of such configurations, but this appears to explain buildfarm member lorikeet's recent struggles. Credit to Andrew Dunstan for spotting the problem. Back-patch to v13 where the bug was introduced. Discussion: https://postgr.es/m/136102.1648320427@sss.pgh.pa.us The problem wasn't lack of coverage, it was that the failure was intermittent and erratic enough to be very hard to diagnose. I think that's more bad luck than anything else. regards, tom lane
Peter Eisentraut <peter@eisentraut.org> writes: > On 26.03.24 00:43, Tom Lane wrote: >> I looked at the musl source code you identified and confirmed that >> only the LD_LIBRARY_PATH string is remembered in a static variable; >> LD_PRELOAD is only accessed locally in that initialization function. >> So we only need to do the attached. (I failed to resist the >> temptation to rewrite the comments.) > Yeah, I was more looking for a comment for posterity for *why* we need > to preserve this variable in particular. The updated comment looks > reasonable. OK, pushed after a bit more comment-fiddling. regards, tom lane