Обсуждение: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain
Hello, While reviewing the pg_ctl CreateProcess patch [1], I started looking into handle inheritance on Windows and found something that puzzles me. I think there's an issue with O_CLOEXEC, but wanted to walk through my reasoning to make sure I'm not missing something obvious. [1] https://www.postgresql.org/message-id/flat/TYAPR01MB586654E2D74B838021BE77CAF5EEA@TYAPR01MB5866.jpnprd01.prod.outlook.com The issue appears to be that O_CLOEXEC doesn't actually do anything on Windows. When PostgreSQL opens a WAL file in xlog.c, it specifies O_CLOEXEC in the OpenTransientFile() call, expecting the file handle to be non-inheritable. However, O_CLOEXEC is defined as 0 in win32_port.h, and our pgwin32_open() function in src/port/open.c unconditionally sets sa.bInheritHandle = TRUE at line 80. So the flag is simply ignored, and all file handles are created inheritable. Now, for a handle to actually be inherited by a child process on Windows, two conditions must both be true. First, the handle itself must have been created with bInheritHandle = TRUE (which we do for everything). Second, the parent must call CreateProcess with bInheritHandles = TRUE. So the question becomes: does that second condition ever happen? It does. When archive_command runs, PostgreSQL calls pgwin32_system(), which wraps the command in quotes and passes it to the Microsoft C runtime's system() function. That function needs to make stdio work for the child process, so it has no choice but to call CreateProcess with bInheritHandles = TRUE. This means cmd.exe inherits all our file handles, including any open database or WAL files, and cmd.exe passes them along to copy.exe or whatever command is being run. I wrote a test program that links against libpgport.a to verify this behavior. It opens files with and without O_CLOEXEC using the actual pgwin32_open() function, then spawns a child process with bInheritHandles = TRUE (mimicking what system() does) and tries to access the handles from the child. Both files were accessible from the child process regardless of whether O_CLOEXEC was specified. The flag has no effect. Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites throughout the backend with a comment saying "Our open() replacement does not create inheritable handles, so it is safe to ignore O_CLOEXEC." But that doesn't appear to match what the code actually does. I'm wondering if I've misunderstood something about how handle inheritance works on Windows, or if the comment was based on a misunderstanding of the code path. The practical impact of this seems low. Child processes spawned by archive_command or COPY PROGRAM operate on file paths passed as arguments, not on inherited file descriptors, so they don't actually make use of the handles even though they have them. Even if a child wanted to use an inherited handle, it would need to somehow learn the numeric handle value, which isn't passed through our IPC mechanisms. And Windows users probably employ archive_command less frequently than Unix users anyway. Nonetheless, if this is really a bug, it's a correctness issue. It violates the documented semantics of O_CLOEXEC, contradicts what our own comments claim, and means PostgreSQL behaves differently on Windows than on Unix. It also creates unnecessary handle leaks where files can't be deleted or renamed while child processes are running. While reviewing my pg_ctl patch, I realized it would make handle inheritance more explicit and direct, which made me want to understand whether O_CLOEXEC actually works. The fix would be straightforward if this is indeed wrong. Define O_CLOEXEC to a non-zero value like 0x80000 (in the private use range for open() flags), and then honor it in pgwin32_open() by setting sa.bInheritHandle based on whether the flag is present: sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; We also have to add the O_CLOEXEC to the assertion in open.c that validates that fileFlags only contains known flags. I've tested this change locally and it works as expected. Files opened with O_CLOEXEC are not accessible from child processes, while files opened without it are accessible. So my questions are: Am I correct that both conditions for handle inheritance are met, meaning handles really are being inherited by archive_command children? Is there something in Windows that prevents inheritance that I don't know about? If this is a real bug, would it make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to provide my test code or do additional testing if that would help. For reference, the Microsoft documentation on handle inheritance is at: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa And I confirmed through research that UCRT's system() does use bInheritHandles = TRUE, which makes sense since it needs stdio to work. Bryan Green
Вложения
On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote: > Hello, Catching up with all your emails, and I must say it's great to see some solid investigation of PostgreSQL-on-Windows problems. There are ... more. > Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites > throughout the backend with a comment saying "Our open() replacement > does not create inheritable handles, so it is safe to ignore > O_CLOEXEC." But that doesn't appear to match what the code actually > does. I'm wondering if I've misunderstood something about how handle > inheritance works on Windows, or if the comment was based on a > misunderstanding of the code path. Yeah, it looks like I was just wrong. Oops. Your analysis looks good to me. > The fix would be straightforward if this is indeed wrong. Define > O_CLOEXEC to a non-zero value like 0x80000 (in the private use range > for open() flags), and then honor it in pgwin32_open() by setting > sa.bInheritHandle based on whether the flag is present: > > sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; Looking at fcntl.h, that's the next free bit, but also the one they'll presumably define next (I guess "private use range" is just a turn of phrase and not a real thing?), so why not use the highest free bit after O_DIRECT? We have three fake open flags, one of which cybersquats a real flag from fcntl.h, ironically the one that actually means O_CLOEXEC. We can't change existing values in released branches, so that'd give: #define O_DIRECT 0x80000000 #define O_CLOEXEC 0x04000000 #define O_DSYNC _O_NO_INHERIT Perhaps in master we could rearrange them: #define O_DIRECT 0x80000000 #define O_DSYNC 0x04000000 #define O_CLOEXEC _O_NO_INHERIT > So my questions are: Am I correct that both conditions for handle > inheritance are met, meaning handles really are being inherited by > archive_command children? Is there something in Windows that prevents > inheritance that I don't know about? If this is a real bug, would it > make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to > provide my test code or do additional testing if that would help. Yeah, seems like it, and like we should back-patch this. I vote for doing that after the upcoming minor releases. Holding files open on Windows unintentionally is worse on Windows than on Unix (preventing directories from being unlinked etc). Of course we've done that for decades so I doubt it's really a big deal, but we should clean up this mistake.
On 11/6/2025 7:43 AM, Thomas Munro wrote:
> On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote:
>> Hello,
>
> Catching up with all your emails, and I must say it's great to see
> some solid investigation of PostgreSQL-on-Windows problems. There are
> ... more.
>
>> Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites
>> throughout the backend with a comment saying "Our open() replacement
>> does not create inheritable handles, so it is safe to ignore
>> O_CLOEXEC." But that doesn't appear to match what the code actually
>> does. I'm wondering if I've misunderstood something about how handle
>> inheritance works on Windows, or if the comment was based on a
>> misunderstanding of the code path.
>
> Yeah, it looks like I was just wrong. Oops. Your analysis looks good to me.
>
>> The fix would be straightforward if this is indeed wrong. Define
>> O_CLOEXEC to a non-zero value like 0x80000 (in the private use range
>> for open() flags), and then honor it in pgwin32_open() by setting
>> sa.bInheritHandle based on whether the flag is present:
>>
>> sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
>
> Looking at fcntl.h, that's the next free bit, but also the one they'll
> presumably define next (I guess "private use range" is just a turn of
> phrase and not a real thing?), so why not use the highest free bit
> after O_DIRECT? We have three fake open flags, one of which
> cybersquats a real flag from fcntl.h, ironically the one that actually
> means O_CLOEXEC. We can't change existing values in released
> branches, so that'd give:
>
> #define O_DIRECT 0x80000000
> #define O_CLOEXEC 0x04000000
> #define O_DSYNC _O_NO_INHERIT
>
> Perhaps in master we could rearrange them:
>
> #define O_DIRECT 0x80000000
> #define O_DSYNC 0x04000000
> #define O_CLOEXEC _O_NO_INHERIT
>
>> So my questions are: Am I correct that both conditions for handle
>> inheritance are met, meaning handles really are being inherited by
>> archive_command children? Is there something in Windows that prevents
>> inheritance that I don't know about? If this is a real bug, would it
>> make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to
>> provide my test code or do additional testing if that would help.
>
> Yeah, seems like it, and like we should back-patch this. I vote for
> doing that after the upcoming minor releases. Holding files open on
> Windows unintentionally is worse on Windows than on Unix (preventing
> directories from being unlinked etc). Of course we've done that for
> decades so I doubt it's really a big deal, but we should clean up this
> mistake.
Thanks for reviewing this and confirming the analysis. Good to know I
wasn't missing something about Windows handle inheritance.
Your point about the bit value makes sense - using 0x04000000 (highest
free bit after O_DIRECT) is definitely safer than 0x80000 which could
collide with future fcntl.h additions. I also appreciate the irony you
pointed out - we're currently using _O_NO_INHERIT (which literally
prevents handle inheritance on Windows) for O_DSYNC instead of
O_CLOEXEC. The rearrangement in master to use _O_NO_INHERIT for what it
actually means semantically makes a lot of sense.
So the plan would be:
Backport branches (v16+):
#define O_DIRECT 0x80000000
#define O_CLOEXEC 0x04000000
#define O_DSYNC _O_NO_INHERIT
Master:
#define O_DIRECT 0x80000000
#define O_DSYNC 0x04000000
#define O_CLOEXEC _O_NO_INHERIT
And then in pgwin32_open():
sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
I will prepare a new version of the patch that implements the suggested
change for master.
--
Bryan Green
EDB: https://www.enterprisedb.com
On 11/6/2025 8:42 AM, Bryan Green wrote: > On 11/6/2025 7:43 AM, Thomas Munro wrote: >> On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote: >>> Hello, >> >> Catching up with all your emails, and I must say it's great to see >> some solid investigation of PostgreSQL-on-Windows problems. There are >> ... more. >> >>> Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites >>> throughout the backend with a comment saying "Our open() replacement >>> does not create inheritable handles, so it is safe to ignore >>> O_CLOEXEC." But that doesn't appear to match what the code actually >>> does. I'm wondering if I've misunderstood something about how handle >>> inheritance works on Windows, or if the comment was based on a >>> misunderstanding of the code path. >> >> Yeah, it looks like I was just wrong. Oops. Your analysis looks good to me. >> >>> The fix would be straightforward if this is indeed wrong. Define >>> O_CLOEXEC to a non-zero value like 0x80000 (in the private use range >>> for open() flags), and then honor it in pgwin32_open() by setting >>> sa.bInheritHandle based on whether the flag is present: >>> >>> sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; >> >> Looking at fcntl.h, that's the next free bit, but also the one they'll >> presumably define next (I guess "private use range" is just a turn of >> phrase and not a real thing?), so why not use the highest free bit >> after O_DIRECT? We have three fake open flags, one of which >> cybersquats a real flag from fcntl.h, ironically the one that actually >> means O_CLOEXEC. We can't change existing values in released >> branches, so that'd give: >> >> #define O_DIRECT 0x80000000 >> #define O_CLOEXEC 0x04000000 >> #define O_DSYNC _O_NO_INHERIT >> >> Perhaps in master we could rearrange them: >> >> #define O_DIRECT 0x80000000 >> #define O_DSYNC 0x04000000 >> #define O_CLOEXEC _O_NO_INHERIT >> >>> So my questions are: Am I correct that both conditions for handle >>> inheritance are met, meaning handles really are being inherited by >>> archive_command children? Is there something in Windows that prevents >>> inheritance that I don't know about? If this is a real bug, would it >>> make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to >>> provide my test code or do additional testing if that would help. >> >> Yeah, seems like it, and like we should back-patch this. I vote for >> doing that after the upcoming minor releases. Holding files open on >> Windows unintentionally is worse on Windows than on Unix (preventing >> directories from being unlinked etc). Of course we've done that for >> decades so I doubt it's really a big deal, but we should clean up this >> mistake. > > Thanks for reviewing this and confirming the analysis. Good to know I > wasn't missing something about Windows handle inheritance. > > Your point about the bit value makes sense - using 0x04000000 (highest > free bit after O_DIRECT) is definitely safer than 0x80000 which could > collide with future fcntl.h additions. I also appreciate the irony you > pointed out - we're currently using _O_NO_INHERIT (which literally > prevents handle inheritance on Windows) for O_DSYNC instead of > O_CLOEXEC. The rearrangement in master to use _O_NO_INHERIT for what it > actually means semantically makes a lot of sense. > > So the plan would be: > > Backport branches (v16+): > #define O_DIRECT 0x80000000 > #define O_CLOEXEC 0x04000000 > #define O_DSYNC _O_NO_INHERIT > > Master: > #define O_DIRECT 0x80000000 > #define O_DSYNC 0x04000000 > #define O_CLOEXEC _O_NO_INHERIT > > And then in pgwin32_open(): > sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; > > I will prepare a new version of the patch that implements the suggested > change for master. > > The changes for master, along with a tap test, are provided with the attached patch. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On 11/7/2025 11:28 AM, Bryan Green wrote: > On 11/6/2025 8:42 AM, Bryan Green wrote: >> On 11/6/2025 7:43 AM, Thomas Munro wrote: ... >> So the plan would be: >> >> Backport branches (v16+): >> #define O_DIRECT 0x80000000 >> #define O_CLOEXEC 0x04000000 >> #define O_DSYNC _O_NO_INHERIT >> >> Master: >> #define O_DIRECT 0x80000000 >> #define O_DSYNC 0x04000000 >> #define O_CLOEXEC _O_NO_INHERIT >> >> And then in pgwin32_open(): >> sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; >> >> I will prepare a new version of the patch that implements the suggested >> change for master. >> >> > The changes for master, along with a tap test, are provided with the > attached patch. > Thanks to CI discovered a mistake in the makefile and meson.build file for the tests. New patch attached. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote: > [v3] "The original commit included a comment suggesting that our open() replacement doesn't create inheritable handles, but it appears this may have been based on a misunderstanding of the code path. In practice, the code was creating inheritable handles in all cases." s/it appears this may have been been/was/ :-) "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private use range for open() flags). Then honor it in pgwin32_open_handle()" Out of date, maybe skip mentioning the value in the commit message? Maybe add a note here about the back-branches preserving existing values and master cleaning up? Do you happen to have a fixup that supplies the difference in the back-branches so we can see it passing in CI? + * Note: We could instead use SetHandleInformation() after CreateFile() to + * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler + * and achieves the same result. + */ It also wouldn't be thread-safe. That is meaningful today for frontend programs (and hopefully some day soon also in the backend). + sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in Windowsian code, not false and true and not reduced to eg !(fileFlags & O_CLOEXEC). Is that a code convention thing from somewhere in Windows-land? +++ b/src/test/modules/test_cloexec/test_cloexec.c I like the test. Very helpful for people reliant on CI for Windows coverage. Independently of all this, and just on the off-chance that it might be interesting to you in future, I have previously tried to write tests for our whole Windows filesystem shim layer and found lots of bugs and understood lots of quirks that way, but never got it to be good enough for inclusion in the tree: https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com There is some overlap with several of your recent patches, as I was testing some of the same wrappers. One of the main things we've battled with in this project is the whole asynchronous-unlink thing, and generally the NT/VMS file locking concept which can't quite be entirely emulated away, and that was one of my main focuses there after we got CI and started debugging the madness. Doing so revealed a bunch of interesting differences in Windows versions and file systems, and to this day we don't really have a project policy on which Windows filesystems PostgreSQL supports (cf your mention of needing NTFS for sparse files in one of your other threads, though I can't imagine that ReFS AKA DevDrive doesn't have those too being a COW system). Speaking of file I/O, and just as an FYI, I have a bunch of semi-working unfinished patches that bring true scatter/gather I/O (instead of the simple looping fallback in pg_preadv()) and native async I/O (for files, but actually also pipes and sockets but let me stick to talking about files for now) to Windows (traditional OVERLAPPED and/or IoRing.h, neither can do everything we need would you believe). Development via trial-by-CI from the safety of my Unix box is slow and painful going, but... let's say a real Windows programmer with a systems programming bent showed up and were interested in this stuff, I would be more than happy to write down everything I think I know about those subjects and post the unfinished work and then I bet development would go fast... just sayin'.
On 11/11/2025 8:34 PM, Thomas Munro wrote: > On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote: >> [v3] > > "The original commit included a comment suggesting that our open() > replacement doesn't create inheritable handles, but it appears this > may have been based on a misunderstanding of the code path. In > practice, the code was creating inheritable handles in all cases." > > s/it appears this may have been been/was/ :-) > Changed. > "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private > use range for open() flags). Then honor it in pgwin32_open_handle()" > Removed. > Out of date, maybe skip mentioning the value in the commit message? > Maybe add a note here about the back-branches preserving existing > values and master cleaning up? Do you happen to have a fixup that > supplies the difference in the back-branches so we can see it passing > in CI? > I have attached a back-patch for v16-v18. > + * Note: We could instead use SetHandleInformation() after CreateFile() to > + * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler > + * and achieves the same result. > + */ > > It also wouldn't be thread-safe. That is meaningful today for > frontend programs (and hopefully some day soon also in the backend). > > + sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; > > Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in > Windowsian code, not false and true and not reduced to eg !(fileFlags > & O_CLOEXEC). Is that a code convention thing from somewhere in > Windows-land? > Yes, old habits die hard. I learned this pattern on Windows. Interestingly, enough when I am not on Windows I write the way you suggest. > +++ b/src/test/modules/test_cloexec/test_cloexec.c > > I like the test. Very helpful for people reliant on CI for Windows coverage. > > Independently of all this, and just on the off-chance that it might be > interesting to you in future, I have previously tried to write tests > for our whole Windows filesystem shim layer and found lots of bugs and > understood lots of quirks that way, but never got it to be good enough > for inclusion in the tree: > > https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com > I shall take a look. > There is some overlap with several of your recent patches, as I was > testing some of the same wrappers. One of the main things we've > battled with in this project is the whole asynchronous-unlink thing, > and generally the NT/VMS file locking concept which can't quite be > entirely emulated away, and that was one of my main focuses there > after we got CI and started debugging the madness. Doing so revealed > a bunch of interesting differences in Windows versions and file > systems, and to this day we don't really have a project policy on > which Windows filesystems PostgreSQL supports (cf your mention of > needing NTFS for sparse files in one of your other threads, though I > can't imagine that ReFS AKA DevDrive doesn't have those too being a > COW system). > > Speaking of file I/O, and just as an FYI, I have a bunch of > semi-working unfinished patches that bring true scatter/gather I/O > (instead of the simple looping fallback in pg_preadv()) and native > async I/O (for files, but actually also pipes and sockets but let me > stick to talking about files for now) to Windows (traditional > OVERLAPPED and/or IoRing.h, neither can do everything we need would > you believe). Development via trial-by-CI from the safety of my Unix > box is slow and painful going, but... let's say a real Windows > programmer with a systems programming bent showed up and were > interested in this stuff, I would be more than happy to write down > everything I think I know about those subjects and post the unfinished > work and then I bet development would go fast... just sayin'. I would absolutely love to read everything you think you know about those subjects and contribute to the work. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On 11/12/2025 4:10 PM, Bryan Green wrote: > On 11/11/2025 8:34 PM, Thomas Munro wrote: >> On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote: >>> [v3] >> >> "The original commit included a comment suggesting that our open() >> replacement doesn't create inheritable handles, but it appears this >> may have been based on a misunderstanding of the code path. In >> practice, the code was creating inheritable handles in all cases." >> >> s/it appears this may have been been/was/ :-) >> > > Changed. > >> "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private >> use range for open() flags). Then honor it in pgwin32_open_handle()" >> > > Removed. >> Out of date, maybe skip mentioning the value in the commit message? >> Maybe add a note here about the back-branches preserving existing >> values and master cleaning up? Do you happen to have a fixup that >> supplies the difference in the back-branches so we can see it passing >> in CI? >> > > I have attached a back-patch for v16-v18. > >> + * Note: We could instead use SetHandleInformation() after CreateFile() to >> + * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler >> + * and achieves the same result. >> + */ >> >> It also wouldn't be thread-safe. That is meaningful today for >> frontend programs (and hopefully some day soon also in the backend). >> >> + sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; >> >> Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in >> Windowsian code, not false and true and not reduced to eg !(fileFlags >> & O_CLOEXEC). Is that a code convention thing from somewhere in >> Windows-land? >> > > Yes, old habits die hard. I learned this pattern on Windows. > Interestingly, enough when I am not on Windows I write the way you suggest. > >> +++ b/src/test/modules/test_cloexec/test_cloexec.c >> >> I like the test. Very helpful for people reliant on CI for Windows coverage. >> >> Independently of all this, and just on the off-chance that it might be >> interesting to you in future, I have previously tried to write tests >> for our whole Windows filesystem shim layer and found lots of bugs and >> understood lots of quirks that way, but never got it to be good enough >> for inclusion in the tree: >> >> https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com >> > > I shall take a look. > >> There is some overlap with several of your recent patches, as I was >> testing some of the same wrappers. One of the main things we've >> battled with in this project is the whole asynchronous-unlink thing, >> and generally the NT/VMS file locking concept which can't quite be >> entirely emulated away, and that was one of my main focuses there >> after we got CI and started debugging the madness. Doing so revealed >> a bunch of interesting differences in Windows versions and file >> systems, and to this day we don't really have a project policy on >> which Windows filesystems PostgreSQL supports (cf your mention of >> needing NTFS for sparse files in one of your other threads, though I >> can't imagine that ReFS AKA DevDrive doesn't have those too being a >> COW system). >> >> Speaking of file I/O, and just as an FYI, I have a bunch of >> semi-working unfinished patches that bring true scatter/gather I/O >> (instead of the simple looping fallback in pg_preadv()) and native >> async I/O (for files, but actually also pipes and sockets but let me >> stick to talking about files for now) to Windows (traditional >> OVERLAPPED and/or IoRing.h, neither can do everything we need would >> you believe). Development via trial-by-CI from the safety of my Unix >> box is slow and painful going, but... let's say a real Windows >> programmer with a systems programming bent showed up and were >> interested in this stuff, I would be more than happy to write down >> everything I think I know about those subjects and post the unfinished >> work and then I bet development would go fast... just sayin'. > > I would absolutely love to read everything you think you know about > those subjects and contribute to the work. > > Corrected master patch and back patch for v16-v18. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On Thu, Nov 13, 2025 at 11:17 AM Bryan Green <dbryan.green@gmail.com> wrote: > Corrected master patch and back patch for v16-v18. Thanks. I wondered what system-generated new handles might appear in a child process and potentially collide with a non-inherited handle's numerical value (perhaps a thread handle or something like that?), but I guess it'd also have to accept a write to confuse the test, which seems unlikely, so that's probably OK. I also hope that the new test could eventually be merged with a general port layer test suite as mentioned earlier. What do you think about these improvements? See attached. * moved and adjusted new comment about flag conversion to cover all three flags, since it's true for all of them * adjusted the comment about why we don't use SetHandleInformation() to highlight that it would be (slightly) leaky * removed O_DIRECT's extra definition from port.h, since it's now in win32_port.h One question is why on earth the open() redirection is in port.h while the "supplements to fcntl.h" are in win32_port.h. Obviously those are tightly coupled. As far as I know there are two forces keeping some Windows porting magic in port.h that we'd ideally isolate in win32_port.h, and this case doesn't appear to qualify for either as far as I can guess, anyway: * some sleep/retry wrappers were thought to be needed by Cygwin too: API-wise it's a POSIX, but it couldn't hide the underlying NT file locking semantics * sometimes we need a later definition time: I recall battling that for #define ftruncate and/or lseek (if you define them before certain system headers are included, you break them) Cygwin's <fcntl.h> defines these flags, if I've found the right file[1], and has its own open() that we're using directly. If it didn't, our code would have failed to compile when Cygwin was being tested in the build farm up until a year or so ago (and it will be tested again soon[2]). So we could probably move at least open() into win32_port.h, in a separate commit. I think it's also quite likely that Cygwin turns on the Windows 10 POSIX directory entry semantics, so even rename() etc could probably be moved over too. (Whether our own porting layer should turn that stuff on too and delete the retry stuff entirely is an open question which no Windows expert has ever opined on, only us Unix hackers battling against random failures in the build farm.) We should probably also set up a CI task for Cygwin if we're keeping support. [1] https://github.com/cygwin/cygwin/blob/main/newlib/libc/include/sys/_default_fcntl.h [2] https://www.postgresql.org/message-id/flat/916d0fd1-a99b-41c4-a017-ff2428bf8cca%40dunslane.net
Вложения
On Sun, Nov 30, 2025 at 1:13 PM Thomas Munro <thomas.munro@gmail.com> wrote: > What do you think about these improvements? See attached. > > * moved and adjusted new comment about flag conversion to cover all > three flags, since it's true for all of them > * adjusted the comment about why we don't use SetHandleInformation() > to highlight that it would be (slightly) leaky > * removed O_DIRECT's extra definition from port.h, since it's now in > win32_port.h Hearing nothing, pushed. I realised while back-patching that REL_16_STABLE was the last release that also had the old Windows-only build/test machinery under src/tools/msvc. It never ran all the tests anyway, and I don't think we'd learn anything new by adding it, given that CI uses meson on that branch, so I didn't worry about remembering how to adjust that for now. If someone feels strongly about it, of course we can.
On 12/9/2025 3:03 PM, Thomas Munro wrote: > On Sun, Nov 30, 2025 at 1:13 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> What do you think about these improvements? See attached. >> >> * moved and adjusted new comment about flag conversion to cover all >> three flags, since it's true for all of them >> * adjusted the comment about why we don't use SetHandleInformation() >> to highlight that it would be (slightly) leaky >> * removed O_DIRECT's extra definition from port.h, since it's now in >> win32_port.h > > Hearing nothing, pushed. > > I realised while back-patching that REL_16_STABLE was the last release > that also had the old Windows-only build/test machinery under > src/tools/msvc. It never ran all the tests anyway, and I don't think > we'd learn anything new by adding it, given that CI uses meson on that > branch, so I didn't worry about remembering how to adjust that for > now. If someone feels strongly about it, of course we can. Well, my drafts folder had nothing but agreement in it... Thanks, -- Bryan Green EDB: https://www.enterprisedb.com
Thomas Munro <thomas.munro@gmail.com> writes:
> Hearing nothing, pushed.
fairywren is unimpressed:
../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests':
../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos' [-Wunused-variable]
137 | char *space_pos;
| ^~~~~~~~~
It's right, but it seems to me that this stanza needs more help than
that:
/*
* Spawn child process with bInheritHandles=TRUE, passing handle values as
* hex strings
*/
snprintf(cmdline, sizeof(cmdline), "\"%s\" %p %p",
GetCommandLine(), h1, h2);
/*
* Find the actual executable path by removing any arguments from
* GetCommandLine().
*/
{
char exe_path[MAX_PATH];
char *space_pos;
GetModuleFileName(NULL, exe_path, sizeof(exe_path));
snprintf(cmdline, sizeof(cmdline), "\"%s\" %p %p",
exe_path, h1, h2);
}
What is the point of that first snprintf(cmdline, ...), when its
result is guaranteed to be overwritten just below?
I'm also dubious about using MAX_PATH here; see the commentary
about MAXPGPATH in pg_config_manual.h. Also, what's the point of
using MAX_PATH when the result is going to be transferred into
cmdline (with a hardwired size of 1024)?
regards, tom lane
On Sat, Dec 13, 2025 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Hearing nothing, pushed. > > fairywren is unimpressed: > > ../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests': > ../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos' [-Wunused-variable] > 137 | char *space_pos; > | ^~~~~~~~~ The CI MinGW task also shows this warning, but it doesn't use -Werror. The separate CompileWarnings task does, being its purpose, and it includes a MinGW cross-build step, but that uses configure, and this test is built only by meson. That wasn't a great idea... we knew we were only dealing with Windows but forgot about MinGW, so I'll go and write a patch to fix that aspect later today so we're covered for warnings. I'll also think about whether it's worth checking for MinGW warnings in both assert and non-assert builds (as we do for regular Linux gcc/clang), and I'd also like to try to catch warnings from MSVC and had an idea for how to do that... I might also try to think about meson-vs-configure cross checks... > What is the point of that first snprintf(cmdline, ...), when its > result is guaranteed to be overwritten just below? > > I'm also dubious about using MAX_PATH here; see the commentary > about MAXPGPATH in pg_config_manual.h. Also, what's the point of > using MAX_PATH when the result is going to be transferred into > cmdline (with a hardwired size of 1024)? Fair points, I'll wait and see if Bryan is free to write a patch on Monday (US), and otherwise write one myself.
On 12/12/2025 9:57 PM, Thomas Munro wrote: > On Sat, Dec 13, 2025 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> Hearing nothing, pushed. >> >> fairywren is unimpressed: >> >> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests': >> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos' [-Wunused-variable] >> 137 | char *space_pos; >> | ^~~~~~~~~ > > The CI MinGW task also shows this warning, but it doesn't use -Werror. > The separate CompileWarnings task does, being its purpose, and it > includes a MinGW cross-build step, but that uses configure, and this > test is built only by meson. That wasn't a great idea... we knew we > were only dealing with Windows but forgot about MinGW, so I'll go and > write a patch to fix that aspect later today so we're covered for > warnings. I'll also think about whether it's worth checking for MinGW > warnings in both assert and non-assert builds (as we do for regular > Linux gcc/clang), and I'd also like to try to catch warnings from MSVC > and had an idea for how to do that... I might also try to think about > meson-vs-configure cross checks... > >> What is the point of that first snprintf(cmdline, ...), when its >> result is guaranteed to be overwritten just below? >> >> I'm also dubious about using MAX_PATH here; see the commentary >> about MAXPGPATH in pg_config_manual.h. Also, what's the point of >> using MAX_PATH when the result is going to be transferred into >> cmdline (with a hardwired size of 1024)? > > Fair points, I'll wait and see if Bryan is free to write a patch on > Monday (US), and otherwise write one myself. I will write a patch tonight. This was my sloppiness from doing incremental changes and not cleaning up behind myself. I'll clean it up. Thanks for the checks... -- Bryan Green EDB: https://www.enterprisedb.com
On 12/12/2025 9:57 PM, Thomas Munro wrote: > On Sat, Dec 13, 2025 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Thomas Munro <thomas.munro@gmail.com> writes: >>> Hearing nothing, pushed. >> >> fairywren is unimpressed: >> >> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests': >> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos' [-Wunused-variable] >> 137 | char *space_pos; >> | ^~~~~~~~~ > > The CI MinGW task also shows this warning, but it doesn't use -Werror. > The separate CompileWarnings task does, being its purpose, and it > includes a MinGW cross-build step, but that uses configure, and this > test is built only by meson. That wasn't a great idea... we knew we > were only dealing with Windows but forgot about MinGW, so I'll go and > write a patch to fix that aspect later today so we're covered for > warnings. I'll also think about whether it's worth checking for MinGW > warnings in both assert and non-assert builds (as we do for regular > Linux gcc/clang), and I'd also like to try to catch warnings from MSVC > and had an idea for how to do that... I might also try to think about > meson-vs-configure cross checks... > >> What is the point of that first snprintf(cmdline, ...), when its >> result is guaranteed to be overwritten just below? >> >> I'm also dubious about using MAX_PATH here; see the commentary >> about MAXPGPATH in pg_config_manual.h. Also, what's the point of >> using MAX_PATH when the result is going to be transferred into >> cmdline (with a hardwired size of 1024)? > > Fair points, I'll wait and see if Bryan is free to write a patch on > Monday (US), and otherwise write one myself. Thomas, A sanity check would be appreciated after the somewhat embarrassing sloppy code. I removed the useless snprintf() call that was using GetCommandLine(). That was left in place when I moved to GetModuleFileName(). Also, removed the unused 'space_pos' variable and the unneeded scope block. I decided to just use 1024 for the exe_path size since that is what cmdline is set to use. I also removed some self-evident comments that were leftover from my practice of writing comments and then coding. Apologies for the loss of time. Thanks, -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
Bryan Green <dbryan.green@gmail.com> writes:
> I removed the useless snprintf() call that was using GetCommandLine().
> That was left in place when I moved to GetModuleFileName(). Also,
> removed the unused 'space_pos' variable and the unneeded scope block.
All good to my eye.
> I decided to just use 1024 for the exe_path size since that is what
> cmdline is set to use.
Personally I'd have gone the other way, say
char exe_path[MAXPGPATH];
char cmdline[MAXPGPATH + 100];
> I also removed some self-evident comments that
> were leftover from my practice of writing comments and then coding.
I think you went way overboard on removing "self-evident" comments.
Signposts as to what the code intends to do are pretty helpful IMO.
They do have to be accurate though, for instance this previous
comment:
- * Find the actual executable path by removing any arguments from
- * GetCommandLine().
didn't seem to convey what the code was doing (which I neglected
to complain about before).
BTW, pgindent will undo some of the whitespace changes you made
here.
regards, tom lane
On Sat, Dec 13, 2025 at 7:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bryan Green <dbryan.green@gmail.com> writes: > > I removed the useless snprintf() call that was using GetCommandLine(). > > That was left in place when I moved to GetModuleFileName(). Also, > > removed the unused 'space_pos' variable and the unneeded scope block. > > All good to my eye. Thanks both. > > I decided to just use 1024 for the exe_path size since that is what > > cmdline is set to use. > > Personally I'd have gone the other way, say > > char exe_path[MAXPGPATH]; > char cmdline[MAXPGPATH + 100]; Done in this version. > > I also removed some self-evident comments that > > were leftover from my practice of writing comments and then coding. > > I think you went way overboard on removing "self-evident" comments. > Signposts as to what the code intends to do are pretty helpful IMO. > They do have to be accurate though, for instance this previous > comment: > > - * Find the actual executable path by removing any arguments from > - * GetCommandLine(). > > didn't seem to convey what the code was doing (which I neglected > to complain about before). Comments restored in the attached version of Bryan's patch. My earlier guess about the Makefile was wrong, and when I looked into it the actual problems were (1) that the CompilerWarnings task in CI runs make world-bin, which doesn't descend into src/test, and (2) that the test ifeq ($(PORTNAME), win32) was not satisfied due to make's rules for variable evaluation. I thought about how to fix that but realised that this is going to be much easier to maintain if it's not different on Unix, so here are some fixes in that direction. With just 0001 and 0002 applied, we'd have known about the compiler warning before commit, with a failure like this: https://cirrus-ci.com/task/5863371716689920 With 0003 applied on top, it's green and there are no warnings from either Windows task: https://cirrus-ci.com/build/4775547869331456 I also changed the comment style of some single-line comments. replaced the memset() with initializer syntax and ran pgindent which undid a change or two.
Вложения
On Sun, Dec 14, 2025 at 6:09 PM Thomas Munro <thomas.munro@gmail.com> wrote: > My earlier guess about the Makefile was wrong, and when I looked into > it the actual problems were (1) that the CompilerWarnings task in CI > runs make world-bin, which doesn't descend into src/test, and (2) that > the test ifeq ($(PORTNAME), win32) was not satisfied due to make's > rules for variable evaluation. I thought about how to fix that but > realised that this is going to be much easier to maintain if it's not > different on Unix, so here are some fixes in that direction. With > just 0001 and 0002 applied, we'd have known about the compiler warning > before commit, with a failure like this: I pushed the cleanup patch. I wondered if there might be any other C code that could be checked for compiler warnings by CI and isn't yet, and the only thing I have some up with so far is the .pgc -> .c stuff. Here is a new version that does that too. I also back-patched a fix for a warning (see eab2323c) that would break if we back-patched this. Is there anything else like this hiding somewhere?