Обсуждение: seahorse again failing

Поиск
Список
Период
Сортировка

seahorse again failing

От
Stefan Kaltenbrunner
Дата:
seahorse just failed again with one of the dreaded "permission denied"
errors we seem to sporadically getting reported on the lists:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=seahorse&dt=2006-08-22%2002:30:01


we seem to attribute those to AV and other security related software -
except that seahorse does not have (and never had) anything like that
installed.
seahorse is just a stock windows XP box (with all patches and
servicepacks applied) and msys/mingw.
There is no other software installed or ever was - maybe there is really
an underlying issue that is causing those sporadic "permission denied"
errors ?


Stefan


Re: seahorse again failing

От
Tom Lane
Дата:
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> seahorse just failed again with one of the dreaded "permission denied"
> errors we seem to sporadically getting reported on the lists:
> seahorse is just a stock windows XP box (with all patches and
> servicepacks applied) and msys/mingw.
> There is no other software installed or ever was - maybe there is really
> an underlying issue that is causing those sporadic "permission denied"
> errors ?

How repeatable is it?

It would be interesting to know the actual underlying Windows error code
--- I see that win32error.c maps several different codes to EACCES.
        regards, tom lane


Re: seahorse again failing

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> > seahorse just failed again with one of the dreaded "permission denied"
> > errors we seem to sporadically getting reported on the lists:
> > seahorse is just a stock windows XP box (with all patches and
> > servicepacks applied) and msys/mingw.
> > There is no other software installed or ever was - maybe there is really
> > an underlying issue that is causing those sporadic "permission denied"
> > errors ?
> 
> How repeatable is it?
> 
> It would be interesting to know the actual underlying Windows error code
> --- I see that win32error.c maps several different codes to EACCES.

It may be a good idea to put a elog(LOG) with the error code in the
failure path of AllocateFile.

This particular problem must be coming from FindMyDatabase (or maybe
RebuildFlatFiles when called from PostgresMain?)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: seahorse again failing

От
Stefan Kaltenbrunner
Дата:
Tom Lane wrote:
> Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> seahorse just failed again with one of the dreaded "permission denied"
>> errors we seem to sporadically getting reported on the lists:
>> seahorse is just a stock windows XP box (with all patches and
>> servicepacks applied) and msys/mingw.
>> There is no other software installed or ever was - maybe there is really
>> an underlying issue that is causing those sporadic "permission denied"
>> errors ?
> 
> How repeatable is it?

this seems two be the second time seahorse managed to trigger that
(first was a manual build a while ago) - so unfortunably not very
repeatable :-(

> 
> It would be interesting to know the actual underlying Windows error code
> --- I see that win32error.c maps several different codes to EACCES.

yeah - is there a way to log the actual windows error code too ?


Stefan


Re: seahorse again failing

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> It would be interesting to know the actual underlying Windows error code
>> --- I see that win32error.c maps several different codes to EACCES.

> It may be a good idea to put a elog(LOG) with the error code in the
> failure path of AllocateFile.

That seems like a plan to me.  I had been thinking of making
win32error.c itself log the conversions, but that would not provide any
context information.  AllocateFile could log the file name along with
the code, which should be enough info to associate a particular log
entry with the actual failure.

Note you should probably save and restore errno around the elog call,
just to be safe.

Could someone with access to Windows code and test this?
        regards, tom lane


Re: seahorse again failing

От
Andrew Dunstan
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>   
>> Tom Lane wrote:
>>     
>>> It would be interesting to know the actual underlying Windows error code
>>> --- I see that win32error.c maps several different codes to EACCES.
>>>       
>
>   
>> It may be a good idea to put a elog(LOG) with the error code in the
>> failure path of AllocateFile.
>>     
>
> That seems like a plan to me.  I had been thinking of making
> win32error.c itself log the conversions, but that would not provide any
> context information.  AllocateFile could log the file name along with
> the code, which should be enough info to associate a particular log
> entry with the actual failure.
>
> Note you should probably save and restore errno around the elog call,
> just to be safe.
>
> Could someone with access to Windows code and test this?
>
>   

All this seems good and sensible.

I am just a little suspicious of seahorse, though, as it is running on a 
Xen VM.

I wonder if we should add a VM column to the buildfarm machine specs.

cheers

andrew


Re: seahorse again failing

От
Stefan Kaltenbrunner
Дата:
Andrew Dunstan wrote:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>  
>>> Tom Lane wrote:
>>>    
>>>> It would be interesting to know the actual underlying Windows error
>>>> code
>>>> --- I see that win32error.c maps several different codes to EACCES.
>>>>       
>>
>>  
>>> It may be a good idea to put a elog(LOG) with the error code in the
>>> failure path of AllocateFile.
>>>     
>>
>> That seems like a plan to me.  I had been thinking of making
>> win32error.c itself log the conversions, but that would not provide any
>> context information.  AllocateFile could log the file name along with
>> the code, which should be enough info to associate a particular log
>> entry with the actual failure.
>>
>> Note you should probably save and restore errno around the elog call,
>> just to be safe.
>>
>> Could someone with access to Windows code and test this?
>>
>>   
> 
> All this seems good and sensible.
> 
> I am just a little suspicious of seahorse, though, as it is running on a
> Xen VM.

indeed seahorse is running under Xen - though i have no reason to
believe that xen is at fault - the eventlog shows absolutly no sign of
any troubles nor does the hypervisor.
The only thing I would think about is that the VM might cause some
subtile timing differences wrt disk-access or scheduling (xen is not
exceptionally bright about cpu scheduling - so it might starve some
guests sometimes).
Other than that I do seem to recall that we got a number of weird
looking "permission denied" errors on win32 - improving the error
reporting might help to find out if there is a pattern involved somewhere.


> 
> I wonder if we should add a VM column to the buildfarm machine specs.

that would be fine with me - maybe we could add a "LDAP" symbol too
since we just had some body failing after the ldap-on-windows fix ?


Stefan


Re: seahorse again failing

От
Martijn van Oosterhout
Дата:
On Tue, Aug 22, 2006 at 10:19:38AM -0400, Tom Lane wrote:
> > It may be a good idea to put a elog(LOG) with the error code in the
> > failure path of AllocateFile.
>
> That seems like a plan to me.  I had been thinking of making
> win32error.c itself log the conversions, but that would not provide any
> context information.  AllocateFile could log the file name along with
> the code, which should be enough info to associate a particular log
> entry with the actual failure.

Would it be possible to get errcode_for_file_access() to report the
results of GetLastError() for windows, or would that roduce spurious
results. At DEBUG lavel maybe?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: seahorse again failing

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> Would it be possible to get errcode_for_file_access() to report the
> results of GetLastError() for windows, or would that roduce spurious
> results. At DEBUG lavel maybe?

It would have to be at LOG level, because otherwise it wouldn't get
logged at all with the default settings that the buildfarm is using.

Also, I think that errcode_for_file_access() may run too late, ie,
we couldn't be sure that we were looking at the same value of
GetLastError.  This could be dealt with by saving GetLastError into
the error data structure at the same place we save errno, but that's
starting to get a bit invasive for a temporary-investigation kluge.

BTW, whoever writes this needs to check that it doesn't change the
default regression test results ...
        regards, tom lane


Re: seahorse again failing

От
"Magnus Hagander"
Дата:
> >> It may be a good idea to put a elog(LOG) with the error code in
> the
> >> failure path of AllocateFile.
> >>
> >
> > That seems like a plan to me.  I had been thinking of making
> > win32error.c itself log the conversions, but that would not
> provide
> > any context information.  AllocateFile could log the file name
> along
> > with the code, which should be enough info to associate a
> particular
> > log entry with the actual failure.
> >
> > Note you should probably save and restore errno around the elog
> call,
> > just to be safe.
> >
> > Could someone with access to Windows code and test this?
> >
> >
>
> All this seems good and sensible.
>
> I am just a little suspicious of seahorse, though, as it is running
> on a Xen VM.
>
> I wonder if we should add a VM column to the buildfarm machine
> specs.

Definitly. If nothing else, it should at least be listed in the platform
identificagtion. AFAIK, Snake is also a VM, and Daves other box as
well... But on VMWare (or was it Virtual Server?) and not Xen, but
still.

//Magnus


Re: seahorse again failing

От
"Dave Page"
Дата:

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
> Magnus Hagander
> Sent: 23 August 2006 09:25
> To: Andrew Dunstan; Tom Lane
> Cc: Alvaro Herrera; Stefan Kaltenbrunner; PostgreSQL-development
> Subject: Re: [HACKERS] seahorse again failing
>
> Definitly. If nothing else, it should at least be listed in
> the platform
> identificagtion. AFAIK, Snake is also a VM, and Daves other box as
> well... But on VMWare (or was it Virtual Server?) and not Xen, but
> still.

No, Snake is real. Bandicoot is a VMWare Server VM running on Snake
though.

/D


Re: seahorse again failing

От
"Magnus Hagander"
Дата:
> > Tom Lane wrote:
> >> It would be interesting to know the actual underlying Windows
> error
> >> code
> >> --- I see that win32error.c maps several different codes to
> EACCES.
>
> > It may be a good idea to put a elog(LOG) with the error code in
> the
> > failure path of AllocateFile.
>
> That seems like a plan to me.  I had been thinking of making
> win32error.c itself log the conversions, but that would not provide
> any context information.  AllocateFile could log the file name
> along with the code, which should be enough info to associate a
> particular log entry with the actual failure.
>
> Note you should probably save and restore errno around the elog
> call, just to be safe.
>
> Could someone with access to Windows code and test this?

Do you mean something as simple as this?

compiles, passes regression tests, logs this on startup of a fresh
cluster:
LOG:  win32 open error on 'global/pgstat.stat': 2

(very simple - it's a file-not-found, which is expected..)


//Magnus


Вложения

Re: seahorse again failing

От
Tom Lane
Дата:
"Magnus Hagander" <mha@sollentuna.net> writes:
>> Could someone with access to Windows code and test this?

> Do you mean something as simple as this?

> compiles, passes regression tests, logs this on startup of a fresh
> cluster:
> LOG:  win32 open error on 'global/pgstat.stat': 2

Looks good --- I tweaked it to log all the info AllocateFile has access
to, just in case it helps.  Now we wait to capture a failure.
Stefan, do you want to set that box to doing continuous regression
tests?  Or anyone else with a Windows machine that's not doing much?
        regards, tom lane