Обсуждение: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

Поиск
Список
Период
Сортировка

[PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
Hello,

While reviewing the pg_ctl CreateProcess patch [1], I started looking 
into handle inheritance on Windows and found something that puzzles me. 
I think there's an issue with O_CLOEXEC, but wanted to walk through my 
reasoning to make sure I'm not missing something obvious.

[1] 

https://www.postgresql.org/message-id/flat/TYAPR01MB586654E2D74B838021BE77CAF5EEA@TYAPR01MB5866.jpnprd01.prod.outlook.com

The issue appears to be that O_CLOEXEC doesn't actually do anything on
Windows. When PostgreSQL opens a WAL file in xlog.c, it specifies
O_CLOEXEC in the OpenTransientFile() call, expecting the file handle to
be non-inheritable. However, O_CLOEXEC is defined as 0 in win32_port.h,
and our pgwin32_open() function in src/port/open.c unconditionally sets
sa.bInheritHandle = TRUE at line 80. So the flag is simply ignored, and
all file handles are created inheritable.

Now, for a handle to actually be inherited by a child process on 
Windows, two conditions must both be true. First, the handle itself must 
have been created with bInheritHandle = TRUE (which we do for 
everything). Second, the parent must call CreateProcess with 
bInheritHandles = TRUE. So the question becomes: does that second 
condition ever happen?

It does. When archive_command runs, PostgreSQL calls pgwin32_system(),
which wraps the command in quotes and passes it to the Microsoft C
runtime's system() function. That function needs to make stdio work for
the child process, so it has no choice but to call CreateProcess with
bInheritHandles = TRUE. This means cmd.exe inherits all our file 
handles, including any open database or WAL files, and cmd.exe passes 
them along to copy.exe or whatever command is being run.

I wrote a test program that links against libpgport.a to verify this
behavior. It opens files with and without O_CLOEXEC using the actual
pgwin32_open() function, then spawns a child process with 
bInheritHandles = TRUE (mimicking what system() does) and tries to
access the handles from the child. Both files were accessible from the
child process regardless of whether O_CLOEXEC was specified. The flag
has no effect.

Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites
throughout the backend with a comment saying "Our open() replacement
does not create inheritable handles, so it is safe to ignore
O_CLOEXEC." But that doesn't appear to match what the code actually
does. I'm wondering if I've misunderstood something about how handle
inheritance works on Windows, or if the comment was based on a
misunderstanding of the code path.

The practical impact of this seems low. Child processes spawned by
archive_command or COPY PROGRAM operate on file paths passed as
arguments, not on inherited file descriptors, so they don't actually
make use of the handles even though they have them. Even if a child
wanted to use an inherited handle, it would need to somehow learn the
numeric handle value, which isn't passed through our IPC mechanisms.
And Windows users probably employ archive_command less frequently than
Unix users anyway.

Nonetheless, if this is really a bug, it's a correctness issue. It
violates the documented semantics of O_CLOEXEC, contradicts what our
own comments claim, and means PostgreSQL behaves differently on Windows
than on Unix. It also creates unnecessary handle leaks where files
can't be deleted or renamed while child processes are running. While
reviewing my pg_ctl patch, I realized it would make handle inheritance
more explicit and direct, which made me want to understand whether
O_CLOEXEC actually works.

The fix would be straightforward if this is indeed wrong. Define
O_CLOEXEC to a non-zero value like 0x80000 (in the private use range
for open() flags), and then honor it in pgwin32_open() by setting
sa.bInheritHandle based on whether the flag is present:

     sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;

We also have to add the O_CLOEXEC to the assertion in open.c that 
validates that fileFlags only contains known flags.

I've tested this change locally and it works as expected. Files opened
with O_CLOEXEC are not accessible from child processes, while files
opened without it are accessible.

So my questions are: Am I correct that both conditions for handle
inheritance are met, meaning handles really are being inherited by
archive_command children? Is there something in Windows that prevents
inheritance that I don't know about? If this is a real bug, would it
make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to
provide my test code or do additional testing if that would help.

For reference, the Microsoft documentation on handle inheritance is at:
https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa

And I confirmed through research that UCRT's system() does use
bInheritHandles = TRUE, which makes sense since it needs stdio to work.

Bryan Green
Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Thomas Munro
Дата:
On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote:
> Hello,

Catching up with all your emails, and I must say it's great to see
some solid investigation of PostgreSQL-on-Windows problems.  There are
... more.

> Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites
> throughout the backend with a comment saying "Our open() replacement
> does not create inheritable handles, so it is safe to ignore
> O_CLOEXEC." But that doesn't appear to match what the code actually
> does. I'm wondering if I've misunderstood something about how handle
> inheritance works on Windows, or if the comment was based on a
> misunderstanding of the code path.

Yeah, it looks like I was just wrong.  Oops.  Your analysis looks good to me.

> The fix would be straightforward if this is indeed wrong. Define
> O_CLOEXEC to a non-zero value like 0x80000 (in the private use range
> for open() flags), and then honor it in pgwin32_open() by setting
> sa.bInheritHandle based on whether the flag is present:
>
>      sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;

Looking at fcntl.h, that's the next free bit, but also the one they'll
presumably define next (I guess "private use range" is just a turn of
phrase and not a real thing?), so why not use the highest free bit
after O_DIRECT?  We have three fake open flags, one of which
cybersquats a real flag from fcntl.h, ironically the one that actually
means O_CLOEXEC.  We can't change existing values in released
branches, so that'd give:

#define     O_DIRECT    0x80000000
#define     O_CLOEXEC   0x04000000
#define     O_DSYNC     _O_NO_INHERIT

Perhaps in master we could rearrange them:

#define     O_DIRECT    0x80000000
#define     O_DSYNC     0x04000000
#define     O_CLOEXEC   _O_NO_INHERIT

> So my questions are: Am I correct that both conditions for handle
> inheritance are met, meaning handles really are being inherited by
> archive_command children? Is there something in Windows that prevents
> inheritance that I don't know about? If this is a real bug, would it
> make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to
> provide my test code or do additional testing if that would help.

Yeah, seems like it, and like we should back-patch this.  I vote for
doing that after the upcoming minor releases.  Holding files open on
Windows unintentionally is worse on Windows than on Unix (preventing
directories from being unlinked etc).  Of course we've done that for
decades so I doubt it's really a big deal, but we should clean up this
mistake.



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 11/6/2025 7:43 AM, Thomas Munro wrote:
> On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote:
>> Hello,
> 
> Catching up with all your emails, and I must say it's great to see
> some solid investigation of PostgreSQL-on-Windows problems.  There are
> ... more.
> 
>> Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites
>> throughout the backend with a comment saying "Our open() replacement
>> does not create inheritable handles, so it is safe to ignore
>> O_CLOEXEC." But that doesn't appear to match what the code actually
>> does. I'm wondering if I've misunderstood something about how handle
>> inheritance works on Windows, or if the comment was based on a
>> misunderstanding of the code path.
> 
> Yeah, it looks like I was just wrong.  Oops.  Your analysis looks good to me.
> 
>> The fix would be straightforward if this is indeed wrong. Define
>> O_CLOEXEC to a non-zero value like 0x80000 (in the private use range
>> for open() flags), and then honor it in pgwin32_open() by setting
>> sa.bInheritHandle based on whether the flag is present:
>>
>>      sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
> 
> Looking at fcntl.h, that's the next free bit, but also the one they'll
> presumably define next (I guess "private use range" is just a turn of
> phrase and not a real thing?), so why not use the highest free bit
> after O_DIRECT?  We have three fake open flags, one of which
> cybersquats a real flag from fcntl.h, ironically the one that actually
> means O_CLOEXEC.  We can't change existing values in released
> branches, so that'd give:
> 
> #define     O_DIRECT    0x80000000
> #define     O_CLOEXEC   0x04000000
> #define     O_DSYNC     _O_NO_INHERIT
> 
> Perhaps in master we could rearrange them:
> 
> #define     O_DIRECT    0x80000000
> #define     O_DSYNC     0x04000000
> #define     O_CLOEXEC   _O_NO_INHERIT
> 
>> So my questions are: Am I correct that both conditions for handle
>> inheritance are met, meaning handles really are being inherited by
>> archive_command children? Is there something in Windows that prevents
>> inheritance that I don't know about? If this is a real bug, would it
>> make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to
>> provide my test code or do additional testing if that would help.
> 
> Yeah, seems like it, and like we should back-patch this.  I vote for
> doing that after the upcoming minor releases.  Holding files open on
> Windows unintentionally is worse on Windows than on Unix (preventing
> directories from being unlinked etc).  Of course we've done that for
> decades so I doubt it's really a big deal, but we should clean up this
> mistake.

Thanks for reviewing this and confirming the analysis. Good to know I
wasn't missing something about Windows handle inheritance.

Your point about the bit value makes sense - using 0x04000000 (highest
free bit after O_DIRECT) is definitely safer than 0x80000 which could
collide with future fcntl.h additions. I also appreciate the irony you
pointed out - we're currently using _O_NO_INHERIT (which literally
prevents handle inheritance on Windows) for O_DSYNC instead of
O_CLOEXEC. The rearrangement in master to use _O_NO_INHERIT for what it
actually means semantically makes a lot of sense.

So the plan would be:

Backport branches (v16+):
#define O_DIRECT    0x80000000
#define O_CLOEXEC   0x04000000
#define O_DSYNC     _O_NO_INHERIT

Master:
#define O_DIRECT    0x80000000
#define O_DSYNC     0x04000000
#define O_CLOEXEC   _O_NO_INHERIT

And then in pgwin32_open():
    sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;

I will prepare a new version of the patch that implements the suggested
change for master.


-- 
Bryan Green
EDB: https://www.enterprisedb.com



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 11/6/2025 8:42 AM, Bryan Green wrote:
> On 11/6/2025 7:43 AM, Thomas Munro wrote:
>> On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote:
>>> Hello,
>>
>> Catching up with all your emails, and I must say it's great to see
>> some solid investigation of PostgreSQL-on-Windows problems.  There are
>> ... more.
>>
>>> Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites
>>> throughout the backend with a comment saying "Our open() replacement
>>> does not create inheritable handles, so it is safe to ignore
>>> O_CLOEXEC." But that doesn't appear to match what the code actually
>>> does. I'm wondering if I've misunderstood something about how handle
>>> inheritance works on Windows, or if the comment was based on a
>>> misunderstanding of the code path.
>>
>> Yeah, it looks like I was just wrong.  Oops.  Your analysis looks good to me.
>>
>>> The fix would be straightforward if this is indeed wrong. Define
>>> O_CLOEXEC to a non-zero value like 0x80000 (in the private use range
>>> for open() flags), and then honor it in pgwin32_open() by setting
>>> sa.bInheritHandle based on whether the flag is present:
>>>
>>>      sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
>>
>> Looking at fcntl.h, that's the next free bit, but also the one they'll
>> presumably define next (I guess "private use range" is just a turn of
>> phrase and not a real thing?), so why not use the highest free bit
>> after O_DIRECT?  We have three fake open flags, one of which
>> cybersquats a real flag from fcntl.h, ironically the one that actually
>> means O_CLOEXEC.  We can't change existing values in released
>> branches, so that'd give:
>>
>> #define     O_DIRECT    0x80000000
>> #define     O_CLOEXEC   0x04000000
>> #define     O_DSYNC     _O_NO_INHERIT
>>
>> Perhaps in master we could rearrange them:
>>
>> #define     O_DIRECT    0x80000000
>> #define     O_DSYNC     0x04000000
>> #define     O_CLOEXEC   _O_NO_INHERIT
>>
>>> So my questions are: Am I correct that both conditions for handle
>>> inheritance are met, meaning handles really are being inherited by
>>> archive_command children? Is there something in Windows that prevents
>>> inheritance that I don't know about? If this is a real bug, would it
>>> make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to
>>> provide my test code or do additional testing if that would help.
>>
>> Yeah, seems like it, and like we should back-patch this.  I vote for
>> doing that after the upcoming minor releases.  Holding files open on
>> Windows unintentionally is worse on Windows than on Unix (preventing
>> directories from being unlinked etc).  Of course we've done that for
>> decades so I doubt it's really a big deal, but we should clean up this
>> mistake.
> 
> Thanks for reviewing this and confirming the analysis. Good to know I
> wasn't missing something about Windows handle inheritance.
> 
> Your point about the bit value makes sense - using 0x04000000 (highest
> free bit after O_DIRECT) is definitely safer than 0x80000 which could
> collide with future fcntl.h additions. I also appreciate the irony you
> pointed out - we're currently using _O_NO_INHERIT (which literally
> prevents handle inheritance on Windows) for O_DSYNC instead of
> O_CLOEXEC. The rearrangement in master to use _O_NO_INHERIT for what it
> actually means semantically makes a lot of sense.
> 
> So the plan would be:
> 
> Backport branches (v16+):
> #define O_DIRECT    0x80000000
> #define O_CLOEXEC   0x04000000
> #define O_DSYNC     _O_NO_INHERIT
> 
> Master:
> #define O_DIRECT    0x80000000
> #define O_DSYNC     0x04000000
> #define O_CLOEXEC   _O_NO_INHERIT
> 
> And then in pgwin32_open():
>     sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
> 
> I will prepare a new version of the patch that implements the suggested
> change for master.
> 
> 
The changes for master, along with a tap test, are provided with the
attached patch.

-- 
Bryan Green
EDB: https://www.enterprisedb.com
Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 11/7/2025 11:28 AM, Bryan Green wrote:
> On 11/6/2025 8:42 AM, Bryan Green wrote:
>> On 11/6/2025 7:43 AM, Thomas Munro wrote:
...
>> So the plan would be:
>>
>> Backport branches (v16+):
>> #define O_DIRECT    0x80000000
>> #define O_CLOEXEC   0x04000000
>> #define O_DSYNC     _O_NO_INHERIT
>>
>> Master:
>> #define O_DIRECT    0x80000000
>> #define O_DSYNC     0x04000000
>> #define O_CLOEXEC   _O_NO_INHERIT
>>
>> And then in pgwin32_open():
>>     sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
>>
>> I will prepare a new version of the patch that implements the suggested
>> change for master.
>>
>>
> The changes for master, along with a tap test, are provided with the
> attached patch.
> 

Thanks to CI discovered a mistake in the makefile and meson.build file
for the tests. New patch attached.
-- 
Bryan Green
EDB: https://www.enterprisedb.com
Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Thomas Munro
Дата:
On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote:
> [v3]

"The original commit included a comment suggesting that our open()
replacement doesn't create inheritable handles, but it appears this
may have been based on a misunderstanding of the code path.  In
practice, the code was creating inheritable handles in all cases."

s/it appears this may have been been/was/ :-)

"To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private
use range for open() flags).  Then honor it in pgwin32_open_handle()"

Out of date, maybe skip mentioning the value in the commit message?
Maybe add a note here about the back-branches preserving existing
values and master cleaning up?  Do you happen to have a fixup that
supplies the difference in the back-branches so we can see it passing
in CI?

+     * Note: We could instead use SetHandleInformation() after CreateFile() to
+     * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler
+     * and achieves the same result.
+     */

It also wouldn't be thread-safe.  That is meaningful today for
frontend programs (and hopefully some day soon also in the backend).

+    sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;

Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in
Windowsian code, not false and true and not reduced to eg !(fileFlags
& O_CLOEXEC).  Is that a code convention thing from somewhere in
Windows-land?

+++ b/src/test/modules/test_cloexec/test_cloexec.c

I like the test.  Very helpful for people reliant on CI for Windows coverage.

Independently of all this, and just on the off-chance that it might be
interesting to you in future, I have previously tried to write tests
for our whole Windows filesystem shim layer and found lots of bugs and
understood lots of quirks that way, but never got it to be good enough
for inclusion in the tree:

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com

There is some overlap with several of your recent patches, as I was
testing some of the same wrappers.  One of the main things we've
battled with in this project is the whole asynchronous-unlink thing,
and generally the NT/VMS file locking concept which can't quite be
entirely emulated away, and that was one of my main focuses there
after we got CI and started debugging the madness.  Doing so revealed
a bunch of interesting differences in Windows versions and file
systems, and to this day we don't really have a project policy on
which Windows filesystems PostgreSQL supports (cf your mention of
needing NTFS for sparse files in one of your other threads, though I
can't imagine that ReFS AKA DevDrive doesn't have those too being a
COW system).

Speaking of file I/O, and just as an FYI, I have a bunch of
semi-working unfinished patches that bring true scatter/gather I/O
(instead of the simple looping fallback in pg_preadv()) and native
async I/O (for files, but actually also pipes and sockets but let me
stick to talking about files for now) to Windows (traditional
OVERLAPPED and/or IoRing.h, neither can do everything we need would
you believe).  Development via trial-by-CI from the safety of my Unix
box is slow and painful going, but... let's say a real Windows
programmer with a systems programming bent showed up and were
interested in this stuff, I would be more than happy to write down
everything I think I know about those subjects and post the unfinished
work and then I bet development would go fast... just sayin'.



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 11/11/2025 8:34 PM, Thomas Munro wrote:
> On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote:
>> [v3]
> 
> "The original commit included a comment suggesting that our open()
> replacement doesn't create inheritable handles, but it appears this
> may have been based on a misunderstanding of the code path.  In
> practice, the code was creating inheritable handles in all cases."
> 
> s/it appears this may have been been/was/ :-)
> 

Changed.

> "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private
> use range for open() flags).  Then honor it in pgwin32_open_handle()"
>

Removed.
> Out of date, maybe skip mentioning the value in the commit message?
> Maybe add a note here about the back-branches preserving existing
> values and master cleaning up?  Do you happen to have a fixup that
> supplies the difference in the back-branches so we can see it passing
> in CI?
> 

I have attached a back-patch for v16-v18.

> +     * Note: We could instead use SetHandleInformation() after CreateFile() to
> +     * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler
> +     * and achieves the same result.
> +     */
> 
> It also wouldn't be thread-safe.  That is meaningful today for
> frontend programs (and hopefully some day soon also in the backend).
> 
> +    sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
> 
> Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in
> Windowsian code, not false and true and not reduced to eg !(fileFlags
> & O_CLOEXEC).  Is that a code convention thing from somewhere in
> Windows-land?
> 

Yes, old habits die hard.  I learned this pattern on Windows.
Interestingly, enough when I am not on Windows I write the way you suggest.

> +++ b/src/test/modules/test_cloexec/test_cloexec.c
> 
> I like the test.  Very helpful for people reliant on CI for Windows coverage.
> 
> Independently of all this, and just on the off-chance that it might be
> interesting to you in future, I have previously tried to write tests
> for our whole Windows filesystem shim layer and found lots of bugs and
> understood lots of quirks that way, but never got it to be good enough
> for inclusion in the tree:
> 
>
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
> 

I shall take a look.

> There is some overlap with several of your recent patches, as I was
> testing some of the same wrappers.  One of the main things we've
> battled with in this project is the whole asynchronous-unlink thing,
> and generally the NT/VMS file locking concept which can't quite be
> entirely emulated away, and that was one of my main focuses there
> after we got CI and started debugging the madness.  Doing so revealed
> a bunch of interesting differences in Windows versions and file
> systems, and to this day we don't really have a project policy on
> which Windows filesystems PostgreSQL supports (cf your mention of
> needing NTFS for sparse files in one of your other threads, though I
> can't imagine that ReFS AKA DevDrive doesn't have those too being a
> COW system).
> 
> Speaking of file I/O, and just as an FYI, I have a bunch of
> semi-working unfinished patches that bring true scatter/gather I/O
> (instead of the simple looping fallback in pg_preadv()) and native
> async I/O (for files, but actually also pipes and sockets but let me
> stick to talking about files for now) to Windows (traditional
> OVERLAPPED and/or IoRing.h, neither can do everything we need would
> you believe).  Development via trial-by-CI from the safety of my Unix
> box is slow and painful going, but... let's say a real Windows
> programmer with a systems programming bent showed up and were
> interested in this stuff, I would be more than happy to write down
> everything I think I know about those subjects and post the unfinished
> work and then I bet development would go fast... just sayin'.

I would absolutely love to read everything you think you know about
those subjects and contribute to the work.


-- 
Bryan Green
EDB: https://www.enterprisedb.com
Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 11/12/2025 4:10 PM, Bryan Green wrote:
> On 11/11/2025 8:34 PM, Thomas Munro wrote:
>> On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote:
>>> [v3]
>>
>> "The original commit included a comment suggesting that our open()
>> replacement doesn't create inheritable handles, but it appears this
>> may have been based on a misunderstanding of the code path.  In
>> practice, the code was creating inheritable handles in all cases."
>>
>> s/it appears this may have been been/was/ :-)
>>
> 
> Changed.
> 
>> "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private
>> use range for open() flags).  Then honor it in pgwin32_open_handle()"
>>
> 
> Removed.
>> Out of date, maybe skip mentioning the value in the commit message?
>> Maybe add a note here about the back-branches preserving existing
>> values and master cleaning up?  Do you happen to have a fixup that
>> supplies the difference in the back-branches so we can see it passing
>> in CI?
>>
> 
> I have attached a back-patch for v16-v18.
> 
>> +     * Note: We could instead use SetHandleInformation() after CreateFile() to
>> +     * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler
>> +     * and achieves the same result.
>> +     */
>>
>> It also wouldn't be thread-safe.  That is meaningful today for
>> frontend programs (and hopefully some day soon also in the backend).
>>
>> +    sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
>>
>> Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in
>> Windowsian code, not false and true and not reduced to eg !(fileFlags
>> & O_CLOEXEC).  Is that a code convention thing from somewhere in
>> Windows-land?
>>
> 
> Yes, old habits die hard.  I learned this pattern on Windows.
> Interestingly, enough when I am not on Windows I write the way you suggest.
> 
>> +++ b/src/test/modules/test_cloexec/test_cloexec.c
>>
>> I like the test.  Very helpful for people reliant on CI for Windows coverage.
>>
>> Independently of all this, and just on the off-chance that it might be
>> interesting to you in future, I have previously tried to write tests
>> for our whole Windows filesystem shim layer and found lots of bugs and
>> understood lots of quirks that way, but never got it to be good enough
>> for inclusion in the tree:
>>
>>
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com
>>
> 
> I shall take a look.
> 
>> There is some overlap with several of your recent patches, as I was
>> testing some of the same wrappers.  One of the main things we've
>> battled with in this project is the whole asynchronous-unlink thing,
>> and generally the NT/VMS file locking concept which can't quite be
>> entirely emulated away, and that was one of my main focuses there
>> after we got CI and started debugging the madness.  Doing so revealed
>> a bunch of interesting differences in Windows versions and file
>> systems, and to this day we don't really have a project policy on
>> which Windows filesystems PostgreSQL supports (cf your mention of
>> needing NTFS for sparse files in one of your other threads, though I
>> can't imagine that ReFS AKA DevDrive doesn't have those too being a
>> COW system).
>>
>> Speaking of file I/O, and just as an FYI, I have a bunch of
>> semi-working unfinished patches that bring true scatter/gather I/O
>> (instead of the simple looping fallback in pg_preadv()) and native
>> async I/O (for files, but actually also pipes and sockets but let me
>> stick to talking about files for now) to Windows (traditional
>> OVERLAPPED and/or IoRing.h, neither can do everything we need would
>> you believe).  Development via trial-by-CI from the safety of my Unix
>> box is slow and painful going, but... let's say a real Windows
>> programmer with a systems programming bent showed up and were
>> interested in this stuff, I would be more than happy to write down
>> everything I think I know about those subjects and post the unfinished
>> work and then I bet development would go fast... just sayin'.
> 
> I would absolutely love to read everything you think you know about
> those subjects and contribute to the work.
> 
> 
Corrected master patch and back patch for v16-v18.

-- 
Bryan Green
EDB: https://www.enterprisedb.com
Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Thomas Munro
Дата:
On Thu, Nov 13, 2025 at 11:17 AM Bryan Green <dbryan.green@gmail.com> wrote:
> Corrected master patch and back patch for v16-v18.

Thanks.

I wondered what system-generated new handles might appear in a child
process and potentially collide with a non-inherited handle's
numerical value (perhaps a thread handle or something like that?), but
I guess it'd also have to accept a write to confuse the test, which
seems unlikely, so that's probably OK.  I also hope that the new test
could eventually be merged with a general port layer test suite as
mentioned earlier.

What do you think about these improvements?  See attached.

* moved and adjusted new comment about flag conversion to cover all
three flags, since it's true for all of them
* adjusted the comment about why we don't use SetHandleInformation()
to highlight that it would be (slightly) leaky
* removed O_DIRECT's extra definition from port.h, since it's now in
win32_port.h

One question is why on earth the open() redirection is in port.h while
the "supplements to fcntl.h" are in win32_port.h.  Obviously those are
tightly coupled.  As far as I know there are two forces keeping some
Windows porting magic in port.h that we'd ideally isolate in
win32_port.h, and this case doesn't appear to qualify for either as
far as I can guess, anyway:

* some sleep/retry wrappers were thought to be needed by Cygwin too:
API-wise it's a POSIX, but it couldn't hide the underlying NT file
locking semantics
* sometimes we need a later definition time: I recall battling that
for #define ftruncate and/or lseek (if you define them before certain
system headers are included, you break them)

Cygwin's <fcntl.h> defines these flags, if I've found the right
file[1], and has its own open() that we're using directly.  If it
didn't, our code would have failed to compile when Cygwin was being
tested in the build farm up until a year or so ago (and it will be
tested again soon[2]).  So we could probably move at least open() into
win32_port.h, in a separate commit.  I think it's also quite likely
that Cygwin turns on the Windows 10 POSIX directory entry semantics,
so even rename() etc could probably be moved over too.  (Whether our
own porting layer should turn that stuff on too and delete the retry
stuff entirely is an open question which no Windows expert has ever
opined on, only us Unix hackers battling against random failures in
the build farm.)  We should probably also set up a CI task for Cygwin
if we're keeping support.

[1] https://github.com/cygwin/cygwin/blob/main/newlib/libc/include/sys/_default_fcntl.h
[2] https://www.postgresql.org/message-id/flat/916d0fd1-a99b-41c4-a017-ff2428bf8cca%40dunslane.net

Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Thomas Munro
Дата:
On Sun, Nov 30, 2025 at 1:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> What do you think about these improvements?  See attached.
>
> * moved and adjusted new comment about flag conversion to cover all
> three flags, since it's true for all of them
> * adjusted the comment about why we don't use SetHandleInformation()
> to highlight that it would be (slightly) leaky
> * removed O_DIRECT's extra definition from port.h, since it's now in
> win32_port.h

Hearing nothing, pushed.

I realised while back-patching that REL_16_STABLE was the last release
that also had the old Windows-only build/test machinery under
src/tools/msvc.  It never ran all the tests anyway, and I don't think
we'd learn anything new by adding it, given that CI uses meson on that
branch, so I didn't worry about remembering how to adjust that for
now.  If someone feels strongly about it, of course we can.



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 12/9/2025 3:03 PM, Thomas Munro wrote:
> On Sun, Nov 30, 2025 at 1:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> What do you think about these improvements?  See attached.
>>
>> * moved and adjusted new comment about flag conversion to cover all
>> three flags, since it's true for all of them
>> * adjusted the comment about why we don't use SetHandleInformation()
>> to highlight that it would be (slightly) leaky
>> * removed O_DIRECT's extra definition from port.h, since it's now in
>> win32_port.h
> 
> Hearing nothing, pushed.
> 
> I realised while back-patching that REL_16_STABLE was the last release
> that also had the old Windows-only build/test machinery under
> src/tools/msvc.  It never ran all the tests anyway, and I don't think
> we'd learn anything new by adding it, given that CI uses meson on that
> branch, so I didn't worry about remembering how to adjust that for
> now.  If someone feels strongly about it, of course we can.
Well, my drafts folder had nothing but agreement in it...

Thanks,

-- 
Bryan Green
EDB: https://www.enterprisedb.com



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> Hearing nothing, pushed.

fairywren is unimpressed:

../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests':
../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos' [-Wunused-variable]
  137 |                 char       *space_pos;
      |                             ^~~~~~~~~

It's right, but it seems to me that this stanza needs more help than
that:

    /*
     * Spawn child process with bInheritHandles=TRUE, passing handle values as
     * hex strings
     */
    snprintf(cmdline, sizeof(cmdline), "\"%s\" %p %p",
             GetCommandLine(), h1, h2);

    /*
     * Find the actual executable path by removing any arguments from
     * GetCommandLine().
     */
    {
        char        exe_path[MAX_PATH];
        char       *space_pos;

        GetModuleFileName(NULL, exe_path, sizeof(exe_path));
        snprintf(cmdline, sizeof(cmdline), "\"%s\" %p %p",
                 exe_path, h1, h2);
    }

What is the point of that first snprintf(cmdline, ...), when its
result is guaranteed to be overwritten just below?

I'm also dubious about using MAX_PATH here; see the commentary
about MAXPGPATH in pg_config_manual.h.  Also, what's the point of
using MAX_PATH when the result is going to be transferred into
cmdline (with a hardwired size of 1024)?

            regards, tom lane



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Thomas Munro
Дата:
On Sat, Dec 13, 2025 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Hearing nothing, pushed.
>
> fairywren is unimpressed:
>
> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests':
> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos'
[-Wunused-variable]
>   137 |                 char       *space_pos;
>       |                             ^~~~~~~~~

The CI MinGW task also shows this warning, but it doesn't use -Werror.
The separate CompileWarnings task does, being its purpose, and it
includes a MinGW cross-build step, but that uses configure, and this
test is built only by meson.  That wasn't a great idea... we knew we
were only dealing with Windows but forgot about MinGW, so I'll go and
write a patch to fix that aspect later today so we're covered for
warnings.  I'll also think about whether it's worth checking for MinGW
warnings in both assert and non-assert builds (as we do for regular
Linux gcc/clang), and I'd also like to try to catch warnings from MSVC
and had an idea for how to do that...  I might also try to think about
meson-vs-configure cross checks...

> What is the point of that first snprintf(cmdline, ...), when its
> result is guaranteed to be overwritten just below?
>
> I'm also dubious about using MAX_PATH here; see the commentary
> about MAXPGPATH in pg_config_manual.h.  Also, what's the point of
> using MAX_PATH when the result is going to be transferred into
> cmdline (with a hardwired size of 1024)?

Fair points, I'll wait and see if Bryan is free to write a patch on
Monday (US), and otherwise write one myself.



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 12/12/2025 9:57 PM, Thomas Munro wrote:
> On Sat, Dec 13, 2025 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thomas Munro <thomas.munro@gmail.com> writes:
>>> Hearing nothing, pushed.
>>
>> fairywren is unimpressed:
>>
>> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests':
>> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos'
[-Wunused-variable]
>>   137 |                 char       *space_pos;
>>       |                             ^~~~~~~~~
> 
> The CI MinGW task also shows this warning, but it doesn't use -Werror.
> The separate CompileWarnings task does, being its purpose, and it
> includes a MinGW cross-build step, but that uses configure, and this
> test is built only by meson.  That wasn't a great idea... we knew we
> were only dealing with Windows but forgot about MinGW, so I'll go and
> write a patch to fix that aspect later today so we're covered for
> warnings.  I'll also think about whether it's worth checking for MinGW
> warnings in both assert and non-assert builds (as we do for regular
> Linux gcc/clang), and I'd also like to try to catch warnings from MSVC
> and had an idea for how to do that...  I might also try to think about
> meson-vs-configure cross checks...
> 
>> What is the point of that first snprintf(cmdline, ...), when its
>> result is guaranteed to be overwritten just below?
>>
>> I'm also dubious about using MAX_PATH here; see the commentary
>> about MAXPGPATH in pg_config_manual.h.  Also, what's the point of
>> using MAX_PATH when the result is going to be transferred into
>> cmdline (with a hardwired size of 1024)?
> 
> Fair points, I'll wait and see if Bryan is free to write a patch on
> Monday (US), and otherwise write one myself.
I will write a patch tonight.  This was my sloppiness from doing
incremental changes and not cleaning up behind myself.  I'll clean it
up.  Thanks for the checks...

-- 
Bryan Green
EDB: https://www.enterprisedb.com



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Bryan Green
Дата:
On 12/12/2025 9:57 PM, Thomas Munro wrote:
> On Sat, Dec 13, 2025 at 3:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thomas Munro <thomas.munro@gmail.com> writes:
>>> Hearing nothing, pushed.
>>
>> fairywren is unimpressed:
>>
>> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c: In function 'run_parent_tests':
>> ../pgsql/src/test/modules/test_cloexec/test_cloexec.c:137:29: warning: unused variable 'space_pos'
[-Wunused-variable]
>>   137 |                 char       *space_pos;
>>       |                             ^~~~~~~~~
> 
> The CI MinGW task also shows this warning, but it doesn't use -Werror.
> The separate CompileWarnings task does, being its purpose, and it
> includes a MinGW cross-build step, but that uses configure, and this
> test is built only by meson.  That wasn't a great idea... we knew we
> were only dealing with Windows but forgot about MinGW, so I'll go and
> write a patch to fix that aspect later today so we're covered for
> warnings.  I'll also think about whether it's worth checking for MinGW
> warnings in both assert and non-assert builds (as we do for regular
> Linux gcc/clang), and I'd also like to try to catch warnings from MSVC
> and had an idea for how to do that...  I might also try to think about
> meson-vs-configure cross checks...
> 
>> What is the point of that first snprintf(cmdline, ...), when its
>> result is guaranteed to be overwritten just below?
>>
>> I'm also dubious about using MAX_PATH here; see the commentary
>> about MAXPGPATH in pg_config_manual.h.  Also, what's the point of
>> using MAX_PATH when the result is going to be transferred into
>> cmdline (with a hardwired size of 1024)?
> 
> Fair points, I'll wait and see if Bryan is free to write a patch on
> Monday (US), and otherwise write one myself.
Thomas,

A sanity check would be appreciated after the somewhat embarrassing
sloppy code.

I removed the useless snprintf() call that was using GetCommandLine().
That was left in place when I moved to GetModuleFileName().  Also,
removed the unused 'space_pos' variable and the unneeded scope block.  I
decided to just use 1024 for the exe_path size since that is what
cmdline is set to use.  I also removed some self-evident comments that
were leftover from my practice of writing comments and then coding.


Apologies for the loss of time.

Thanks,

-- 
Bryan Green
EDB: https://www.enterprisedb.com
Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Tom Lane
Дата:
Bryan Green <dbryan.green@gmail.com> writes:
> I removed the useless snprintf() call that was using GetCommandLine().
> That was left in place when I moved to GetModuleFileName().  Also,
> removed the unused 'space_pos' variable and the unneeded scope block.

All good to my eye.

> I decided to just use 1024 for the exe_path size since that is what
> cmdline is set to use.

Personally I'd have gone the other way, say

    char        exe_path[MAXPGPATH];
     char        cmdline[MAXPGPATH + 100];

> I also removed some self-evident comments that
> were leftover from my practice of writing comments and then coding.

I think you went way overboard on removing "self-evident" comments.
Signposts as to what the code intends to do are pretty helpful IMO.
They do have to be accurate though, for instance this previous
comment:

-     * Find the actual executable path by removing any arguments from
-     * GetCommandLine().

didn't seem to convey what the code was doing (which I neglected
to complain about before).

BTW, pgindent will undo some of the whitespace changes you made
here.

            regards, tom lane



Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Thomas Munro
Дата:
On Sat, Dec 13, 2025 at 7:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Bryan Green <dbryan.green@gmail.com> writes:
> > I removed the useless snprintf() call that was using GetCommandLine().
> > That was left in place when I moved to GetModuleFileName().  Also,
> > removed the unused 'space_pos' variable and the unneeded scope block.
>
> All good to my eye.

Thanks both.

> > I decided to just use 1024 for the exe_path size since that is what
> > cmdline is set to use.
>
> Personally I'd have gone the other way, say
>
>         char            exe_path[MAXPGPATH];
>         char            cmdline[MAXPGPATH + 100];

Done in this version.

> > I also removed some self-evident comments that
> > were leftover from my practice of writing comments and then coding.
>
> I think you went way overboard on removing "self-evident" comments.
> Signposts as to what the code intends to do are pretty helpful IMO.
> They do have to be accurate though, for instance this previous
> comment:
>
> -        * Find the actual executable path by removing any arguments from
> -        * GetCommandLine().
>
> didn't seem to convey what the code was doing (which I neglected
> to complain about before).

Comments restored in the attached version of Bryan's patch.

My earlier guess about the Makefile was wrong, and when I looked into
it the actual problems were (1) that the CompilerWarnings task in CI
runs make world-bin, which doesn't descend into src/test, and (2) that
the test ifeq ($(PORTNAME), win32) was not satisfied due to make's
rules for variable evaluation.  I thought about how to fix that but
realised that this is going to be much easier to maintain if it's not
different on Unix, so here are some fixes in that direction.  With
just 0001 and 0002 applied, we'd have known about the compiler warning
before commit, with a failure like this:

https://cirrus-ci.com/task/5863371716689920

With 0003 applied on top, it's green and there are no warnings from
either Windows task:

https://cirrus-ci.com/build/4775547869331456

I also changed the comment style of some single-line comments.
replaced the memset() with initializer syntax and ran pgindent which
undid a change or two.

Вложения

Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain

От
Thomas Munro
Дата:
On Sun, Dec 14, 2025 at 6:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> My earlier guess about the Makefile was wrong, and when I looked into
> it the actual problems were (1) that the CompilerWarnings task in CI
> runs make world-bin, which doesn't descend into src/test, and (2) that
> the test ifeq ($(PORTNAME), win32) was not satisfied due to make's
> rules for variable evaluation.  I thought about how to fix that but
> realised that this is going to be much easier to maintain if it's not
> different on Unix, so here are some fixes in that direction.  With
> just 0001 and 0002 applied, we'd have known about the compiler warning
> before commit, with a failure like this:

I pushed the cleanup patch.

I wondered if there might be any other C code that could be checked
for compiler warnings by CI and isn't yet, and the only thing I have
some up with so far is the .pgc -> .c stuff.  Here is a new version
that does that too.  I also back-patched a fix for a warning (see
eab2323c) that would break if we back-patched this.  Is there anything
else like this hiding somewhere?

Вложения