Обсуждение: Re: Is this a bug in pg_current_logfile() on Windows?

Поиск
Список
Период
Сортировка

Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
[ redirecting to pghackers ]

Thomas Kellerer <shammat@gmx.net> writes:
> Tom Lane schrieb am 08.07.2020 um 18:41:
>> Somehow, the reading file is being left in binary mode and thus it's
>> failing to convert \r\n back to plain \n.
>> Now the weird thing about that is I'd have expected "r" and "w" modes
>> to imply Windows text mode already, so that I'd have figured that
>> _setmode call to be a useless no-op.  Apparently on some Windows libc
>> implementations, it's not.  How was your installation built exactly?

> That's the build from EnterpriseDB

What I'd momentarily forgotten is that we don't use Windows' native
fopen().  On that platform, we use pgwin32_fopen which defaults to
binary mode (because _open_osfhandle does).  So the _setmode calls in
syslogger.c are *not* no-ops, and the failure in pg_current_logfile()
is clearly explained by the fact that it's doing nothing to strip
carriage returns.

However ... I put in a test case to try to expose this failure, and
our Windows buildfarm critters remain perfectly happy.  So what's up
with that?  After some digging around, I believe the reason is that
PostgresNode::psql is stripping the \r from pg_current_logfile()'s
result, here:

        $$stdout =~ s/\r//g if $TestLib::windows_os;

I'm slightly tempted to extend the test case by verifying on the
server side that the result ends in ".log" with no extra characters.
More generally, I wonder if the above behavior is really a good idea.
It seems to have been added in commit 33f3bbc6d as a hack to avoid
having to think too hard about mingw's behavior, but now I wonder if
it isn't masking other bugs too.  At the very least I think we ought
to tighten the coding to

        $$stdout =~ s/\r\n/\n/g if $TestLib::windows_os;

so that it won't strip carriage returns at random.

Meanwhile, back at the ranch, how shall we fix pg_current_logfile()?
I see two credible alternatives:

1. Insert
#ifdef WIN32
    _setmode(_fileno(fd), _O_TEXT);
#endif
to make this function match the coding in syslogger.c.

2. Manually strip '\r' if present, independent of platform.

The second alternative would conform to the policy we established in
commit b654714f9, that newline-chomping code should uniformly drop \r.
However, that policy is mainly intended to allow non-Windows builds
to cope with text files that might have been made with a Windows text
editor.  Surely we don't need to worry about a cross-platform source
for the log metafile.  So I'm leaning a bit to the first alternative,
so as not to add useless overhead and complexity on non-Windows builds.

Thoughts?

            regards, tom lane



Re: Is this a bug in pg_current_logfile() on Windows?

От
Andrew Dunstan
Дата:
On 7/8/20 5:26 PM, Tom Lane wrote:
>
> However ... I put in a test case to try to expose this failure, and
> our Windows buildfarm critters remain perfectly happy.  So what's up
> with that?  After some digging around, I believe the reason is that
> PostgresNode::psql is stripping the \r from pg_current_logfile()'s
> result, here:
>
>         $$stdout =~ s/\r//g if $TestLib::windows_os;
>
> I'm slightly tempted to extend the test case by verifying on the
> server side that the result ends in ".log" with no extra characters.
> More generally, I wonder if the above behavior is really a good idea.
> It seems to have been added in commit 33f3bbc6d as a hack to avoid
> having to think too hard about mingw's behavior, but now I wonder if
> it isn't masking other bugs too.  At the very least I think we ought
> to tighten the coding to
>
>         $$stdout =~ s/\r\n/\n/g if $TestLib::windows_os;
>
> so that it won't strip carriage returns at random.
>

Seems reasonable. If we rip it out completely we'll have to find all the
places it breaks and fix them. And we'll almost certainly get new
breakage. If it's hiding a real bug we'll have to do that, but I'd be
reluctant unless there's actual proof.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 7/8/20 5:26 PM, Tom Lane wrote:
>> However ... I put in a test case to try to expose this failure, and
>> our Windows buildfarm critters remain perfectly happy.  So what's up
>> with that?  After some digging around, I believe the reason is that
>> PostgresNode::psql is stripping the \r from pg_current_logfile()'s
>> result, here:
>>     $$stdout =~ s/\r//g if $TestLib::windows_os;
>> I'm slightly tempted to extend the test case by verifying on the
>> server side that the result ends in ".log" with no extra characters.
>> More generally, I wonder if the above behavior is really a good idea.
>> It seems to have been added in commit 33f3bbc6d as a hack to avoid
>> having to think too hard about mingw's behavior, but now I wonder if
>> it isn't masking other bugs too.  At the very least I think we ought
>> to tighten the coding to
>>     $$stdout =~ s/\r\n/\n/g if $TestLib::windows_os;
>> so that it won't strip carriage returns at random.

> Seems reasonable. If we rip it out completely we'll have to find all the
> places it breaks and fix them. And we'll almost certainly get new
> breakage. If it's hiding a real bug we'll have to do that, but I'd be
> reluctant unless there's actual proof.

Hard to tell.  What I propose to do right now is change the \r filters
as shown above, and see if the test I added in 004_logrotate.pl starts
to show failures on Windows.  If it does, and no other place does,
I'm willing to be satisfied with that.  If we see *other* failures then
that'd prove that the problem is real, no?

            regards, tom lane



Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
I wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> Seems reasonable. If we rip it out completely we'll have to find all the
>> places it breaks and fix them. And we'll almost certainly get new
>> breakage. If it's hiding a real bug we'll have to do that, but I'd be
>> reluctant unless there's actual proof.

> Hard to tell.  What I propose to do right now is change the \r filters
> as shown above, and see if the test I added in 004_logrotate.pl starts
> to show failures on Windows.  If it does, and no other place does,
> I'm willing to be satisfied with that.  If we see *other* failures then
> that'd prove that the problem is real, no?

So I did that, and the first report is from bowerbird and it's still
green.  Unless I'm completely misinterpreting what's happening (always
a possibility), that means we're still managing to remove "data"
occurrences of \r.

The most likely theory about that, I think, is that IPC::Run::run already
translated any \r\n occurrences in the psql command's output to plain \n.
Then, the \r generated by pg_current_logfile() would butt up against the
line-ending \n, allowing the "fix" in sub psql to remove valid data.

One thing I noticed while making 91bdf499b is that some of these
substitutions are conditioned on "if $TestLib::windows_os" while others
are conditioned on "if $Config{osname} eq 'msys'".  What is the reason
for this difference?  Is it possible that we only really need to do it
in the latter case?

            regards, tom lane



Re: Is this a bug in pg_current_logfile() on Windows?

От
Andrew Dunstan
Дата:
On 7/8/20 10:40 PM, Tom Lane wrote:
> I wrote:
>> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>>> Seems reasonable. If we rip it out completely we'll have to find all the
>>> places it breaks and fix them. And we'll almost certainly get new
>>> breakage. If it's hiding a real bug we'll have to do that, but I'd be
>>> reluctant unless there's actual proof.
>> Hard to tell.  What I propose to do right now is change the \r filters
>> as shown above, and see if the test I added in 004_logrotate.pl starts
>> to show failures on Windows.  If it does, and no other place does,
>> I'm willing to be satisfied with that.  If we see *other* failures then
>> that'd prove that the problem is real, no?
> So I did that, and the first report is from bowerbird and it's still
> green.  Unless I'm completely misinterpreting what's happening (always
> a possibility), that means we're still managing to remove "data"
> occurrences of \r.
>
> The most likely theory about that, I think, is that IPC::Run::run already
> translated any \r\n occurrences in the psql command's output to plain \n.
> Then, the \r generated by pg_current_logfile() would butt up against the
> line-ending \n, allowing the "fix" in sub psql to remove valid data.


It's possible. I do see some mangling of that kind in IPC::Run's
Win32IO.pm and Win32Pump.pm.


Attached for reference is the IPC::Run package I usually use on Windows.


>
> One thing I noticed while making 91bdf499b is that some of these
> substitutions are conditioned on "if $TestLib::windows_os" while others
> are conditioned on "if $Config{osname} eq 'msys'".  What is the reason
> for this difference?  Is it possible that we only really need to do it
> in the latter case?
>
>             


In general I make the condition for such hacks as restrictive as
possible. I don't guarantee that I have been perfectly consistent about
that, though.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Вложения

Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 7/8/20 10:40 PM, Tom Lane wrote:
>> So I did that, and the first report is from bowerbird and it's still
>> green.  Unless I'm completely misinterpreting what's happening (always
>> a possibility), that means we're still managing to remove "data"
>> occurrences of \r.
>> The most likely theory about that, I think, is that IPC::Run::run already
>> translated any \r\n occurrences in the psql command's output to plain \n.
>> Then, the \r generated by pg_current_logfile() would butt up against the
>> line-ending \n, allowing the "fix" in sub psql to remove valid data.

> It's possible. I do see some mangling of that kind in IPC::Run's
> Win32IO.pm and Win32Pump.pm.

The plot thickens: as of this morning, fairywren and jacana are showing
the failure I expected, while drongo and bowerbird are not.  (Our other
Windows animals are not running the TAP tests, so they're no help here.)

It's not hard to believe that the latter two are using a different libc
implementation, but how would that affect the behavior of the TAP
infrastructure?  Are they also using different Perls?  (By hypothesis,
the pg_current_logfile bug exists across all Windows builds, so we have
to explain why the TAP tests only reveal it on some of these animals.)

            regards, tom lane



Re: Is this a bug in pg_current_logfile() on Windows?

От
Andrew Dunstan
Дата:
On 7/9/20 10:44 AM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 7/8/20 10:40 PM, Tom Lane wrote:
>>> So I did that, and the first report is from bowerbird and it's still
>>> green.  Unless I'm completely misinterpreting what's happening (always
>>> a possibility), that means we're still managing to remove "data"
>>> occurrences of \r.
>>> The most likely theory about that, I think, is that IPC::Run::run already
>>> translated any \r\n occurrences in the psql command's output to plain \n.
>>> Then, the \r generated by pg_current_logfile() would butt up against the
>>> line-ending \n, allowing the "fix" in sub psql to remove valid data.
>> It's possible. I do see some mangling of that kind in IPC::Run's
>> Win32IO.pm and Win32Pump.pm.
> The plot thickens: as of this morning, fairywren and jacana are showing
> the failure I expected, while drongo and bowerbird are not.  (Our other
> Windows animals are not running the TAP tests, so they're no help here.)
>
> It's not hard to believe that the latter two are using a different libc
> implementation, but how would that affect the behavior of the TAP
> infrastructure?  Are they also using different Perls?  (By hypothesis,
> the pg_current_logfile bug exists across all Windows builds, so we have
> to explain why the TAP tests only reveal it on some of these animals.)
>
>             



They should use the same libc implementation (msvcrt.dll).


But the perls they are using are indeed different - msys animals have to
use msys' perl for TAP tests because native perl doesn't understand msys
file paths. Conversely, MSVC animals must use native perl (AS or
Strawberry) to run TAP tests. So jacana and fairywren, the two msys
animals, are doing what you expect5ed and drongo and bowerbird, the two
MSVC animals, are not.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 7/9/20 10:44 AM, Tom Lane wrote:
>> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>>> On 7/8/20 10:40 PM, Tom Lane wrote:
>>>> The most likely theory about that, I think, is that IPC::Run::run already
>>>> translated any \r\n occurrences in the psql command's output to plain \n.

>> It's not hard to believe that the latter two are using a different libc
>> implementation, but how would that affect the behavior of the TAP
>> infrastructure?  Are they also using different Perls?  (By hypothesis,
>> the pg_current_logfile bug exists across all Windows builds, so we have
>> to explain why the TAP tests only reveal it on some of these animals.)

> But the perls they are using are indeed different - msys animals have to
> use msys' perl for TAP tests because native perl doesn't understand msys
> file paths. Conversely, MSVC animals must use native perl (AS or
> Strawberry) to run TAP tests. So jacana and fairywren, the two msys
> animals, are doing what you expect5ed and drongo and bowerbird, the two
> MSVC animals, are not.

Ah-hah.  So this leads to the conclusion that in native perl, IPC::Run
is doing \r\n conversion for us while in msys perl it is not.

Therefore, we either should figure out how to get msys perl to do
that conversion (and remove it from our code altogether), or make the
conversions conditional on "is it msys perl?".  I am not quite sure
if the existing tests "if $Config{osname} eq 'msys'" are a legitimate
implementation of that condition or not --- it seems like nominally
they are checking the OS not the Perl, but maybe it's close enough.

            regards, tom lane



Re: Is this a bug in pg_current_logfile() on Windows?

От
Andrew Dunstan
Дата:
On 7/9/20 11:04 AM, Tom Lane wrote:
> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>> On 7/9/20 10:44 AM, Tom Lane wrote:
>>> Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>>>> On 7/8/20 10:40 PM, Tom Lane wrote:
>>>>> The most likely theory about that, I think, is that IPC::Run::run already
>>>>> translated any \r\n occurrences in the psql command's output to plain \n.
>>> It's not hard to believe that the latter two are using a different libc
>>> implementation, but how would that affect the behavior of the TAP
>>> infrastructure?  Are they also using different Perls?  (By hypothesis,
>>> the pg_current_logfile bug exists across all Windows builds, so we have
>>> to explain why the TAP tests only reveal it on some of these animals.)
>> But the perls they are using are indeed different - msys animals have to
>> use msys' perl for TAP tests because native perl doesn't understand msys
>> file paths. Conversely, MSVC animals must use native perl (AS or
>> Strawberry) to run TAP tests. So jacana and fairywren, the two msys
>> animals, are doing what you expect5ed and drongo and bowerbird, the two
>> MSVC animals, are not.
> Ah-hah.  So this leads to the conclusion that in native perl, IPC::Run
> is doing \r\n conversion for us while in msys perl it is not.
>
> Therefore, we either should figure out how to get msys perl to do
> that conversion (and remove it from our code altogether), or make the
> conversions conditional on "is it msys perl?".  I am not quite sure
> if the existing tests "if $Config{osname} eq 'msys'" are a legitimate
> implementation of that condition or not --- it seems like nominally
> they are checking the OS not the Perl, but maybe it's close enough.
>
>             



If the reported OS is msys (it's a pseudo OS in effect) then the perl
must be msys' perl. Even when called from msys, native perl reports the
OS as MSWin32. So yes, close enough.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 7/9/20 11:04 AM, Tom Lane wrote:
>> Therefore, we either should figure out how to get msys perl to do
>> that conversion (and remove it from our code altogether), or make the
>> conversions conditional on "is it msys perl?".  I am not quite sure
>> if the existing tests "if $Config{osname} eq 'msys'" are a legitimate
>> implementation of that condition or not --- it seems like nominally
>> they are checking the OS not the Perl, but maybe it's close enough.

> If the reported OS is msys (it's a pseudo OS in effect) then the perl
> must be msys' perl. Even when called from msys, native perl reports the
> OS as MSWin32. So yes, close enough.

Cool, I'll go try changing all those conditions to use the msys test.

            regards, tom lane



Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
I wrote:
> Cool, I'll go try changing all those conditions to use the msys test.

OK, that worked: all four relevant buildfarm members are now showing
the expected test failure.  So I'll go fix the original bug.

Should we consider back-patching the CRLF filtering changes, ie
91bdf499b + ffb4cee43?  It's not really necessary perhaps, but
I dislike situations where the "same" test on different branches is
testing different things.  Seems like a recipe for future surprises.

            regards, tom lane



Re: Is this a bug in pg_current_logfile() on Windows?

От
Andrew Dunstan
Дата:
On 7/9/20 3:36 PM, Tom Lane wrote:
> I wrote:
>> Cool, I'll go try changing all those conditions to use the msys test.
> OK, that worked: all four relevant buildfarm members are now showing
> the expected test failure.  So I'll go fix the original bug.
>
> Should we consider back-patching the CRLF filtering changes, ie
> 91bdf499b + ffb4cee43?  It's not really necessary perhaps, but
> I dislike situations where the "same" test on different branches is
> testing different things.  Seems like a recipe for future surprises.


Yes please.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Is this a bug in pg_current_logfile() on Windows?

От
Tom Lane
Дата:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> On 7/9/20 3:36 PM, Tom Lane wrote:
>> Should we consider back-patching the CRLF filtering changes, ie
>> 91bdf499b + ffb4cee43?  It's not really necessary perhaps, but
>> I dislike situations where the "same" test on different branches is
>> testing different things.  Seems like a recipe for future surprises.

> Yes please.

Done.

            regards, tom lane