Обсуждение: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

Поиск
Список
Период
Сортировка

BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

От
"Nacho Mezzadra"
Дата:
The following bug has been logged online:

Bug reference:      5603
Logged by:          Nacho Mezzadra
Email address:      nachomezzadra@gmail.com
PostgreSQL version: 8.3.11
Operating system:   Red Hat Enterprise 5.3
Description:        pg_tblspc and pg_twoface directories get deleted when
starting up service
Details:

This issue happened not very frequently, but it happened to me 3 times, in 3
different Red Hat servers.
The thing is that when stopping the Postgresql service with the
"/sbin/service postgresql-8.3 stop" command, and after that starting it with
the "/sbin/service postgresql-8.3 start" command (haven't tried with the
restart one though), a few times both pg_tblspc and pg_twoface  directories
(inside data directory) get somehow deleted and hence the start service
command fails.  Looking in the log files I find the following error:

2010-07-19 16:54:55 ISTFATAL:  could not open directory "pg_tblspc": No such
file or directory

So I manually create the "pg_tblspc" directory, and then try to start again
the service unsuccessfully, getting this time a similar error, but saying
that pg_twoface directory doesn't exist.

After creating the pg_twoface directory, service can be successfully
started.

Please note that all these always happened running the service command as
root.
All 3 linux boxes are running over a VMWare host.

Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

От
Robert Haas
Дата:
On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com> wr=
ote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A05603
> Logged by: =A0 =A0 =A0 =A0 =A0Nacho Mezzadra
> Email address: =A0 =A0 =A0nachomezzadra@gmail.com
> PostgreSQL version: 8.3.11
> Operating system: =A0 Red Hat Enterprise 5.3
> Description: =A0 =A0 =A0 =A0pg_tblspc and pg_twoface directories get dele=
ted when
> starting up service
> Details:
>
> This issue happened not very frequently, but it happened to me 3 times, i=
n 3
> different Red Hat servers.
> The thing is that when stopping the Postgresql service with the
> "/sbin/service postgresql-8.3 stop" command, and after that starting it w=
ith
> the "/sbin/service postgresql-8.3 start" command (haven't tried with the
> restart one though), a few times both pg_tblspc and pg_twoface =A0directo=
ries
> (inside data directory) get somehow deleted and hence the start service
> command fails. =A0Looking in the log files I find the following error:
>
> 2010-07-19 16:54:55 ISTFATAL: =A0could not open directory "pg_tblspc": No=
 such
> file or directory
>
> So I manually create the "pg_tblspc" directory, and then try to start aga=
in
> the service unsuccessfully, getting this time a similar error, but saying
> that pg_twoface directory doesn't exist.
>
> After creating the pg_twoface directory, service can be successfully
> started.
>
> Please note that all these always happened running the service command as
> root.
> All 3 linux boxes are running over a VMWare host.

This is pretty scary, but it's a little hard to believe that Red Hat
would ship a script which had even the faintest chance of obliterating
two critical directories.  Especially since the guy who does the
packaging of PostgreSQL over thereabouts is our most knowledgeable,
experienced, and prolific committer.  So I suspect you've a (broken)
custom script, or a cron job that's doing something evil, or some
other weirdness that is specific to your installations, but you
haven't provided enough details to speculate in detail (for example,
perhaps you could reply to the list and post a copy of the script you
think is doing this).

Also, I'm pretty sure that we don't have a directory called
pg_twoface, though it would pretty funny if we did.  It's fairly
obvious what this is meant to say, but it doesn't.

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com> wrote:
>> PostgreSQL version: 8.3.11
>> Operating system:   Red Hat Enterprise 5.3
>> Description:        pg_tblspc and pg_twoface directories get deleted when
>> starting up service

> This is pretty scary, but it's a little hard to believe that Red Hat
> would ship a script which had even the faintest chance of obliterating
> two critical directories.  Especially since the guy who does the
> packaging of PostgreSQL over thereabouts is our most knowledgeable,
> experienced, and prolific committer.  So I suspect you've a (broken)
> custom script, or a cron job that's doing something evil, or some
> other weirdness that is specific to your installations, but you
> haven't provided enough details to speculate in detail (for example,
> perhaps you could reply to the list and post a copy of the script you
> think is doing this).

Well, I have to disclaim credit/blame for this, because Red Hat has
never shipped PG 8.3.anything for RHEL-5.  Possibly the OP is running
Devrim's or Command Prompt's RPMs.  That said, the initscript Devrim
uses looks just about like mine, and there's no chance whatever that it
would selectively delete portions of what's under $PGDATA.  I have to
think that there's a loose cannon somewhere else in the OP's system.
We have for example seen some very unfortunate behavior in the past
when the data directory was located on a slow-to-mount NFS server.
(I have no reason to think that that's exactly what this problem is;
I just cite it to illustrate the kind of thing to be looking for.)

            regards, tom lane

Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

От
Nacho Mezzadra
Дата:
Tom, Robert, sorry I am coming back to you after a while, but we still
have the same issue.  This has been happening in our environments, but
now it is also happening in customers' environments -which we do not
set up- and it is also happening.  All environments are always Red Hat
Enterprise 5.3.
As reported in the issue, when starting a service using /sbin/service
postgresql-8.3 start, sometimes the directories data/pg_tblspc and
data/pg_twophase get deleted and PostgreSQL engine won't start up.  As
a workaround, we recreate both directories and PostgreSQL can be
started again, but we need to know why this is happening and if it
ever will harm in any way our data.
Please let me know if you need any more info, or whatever.
Thanks a lot in advance,
Nacho.-

>On Tue, Aug 10, 2010 at 01:11, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com=
> wrote:
> >> PostgreSQL version: 8.3.11
> >> Operating system: =C2=A0 Red Hat Enterprise 5.3
> >> Description: =C2=A0 =C2=A0 =C2=A0 =C2=A0pg_tblspc and pg_twoface direc=
tories get deleted when
> >> starting up service
>
> > This is pretty scary, but it's a little hard to believe that Red Hat
> > would ship a script which had even the faintest chance of obliterating
> > two critical directories. =C2=A0Especially since the guy who does the
> > packaging of PostgreSQL over thereabouts is our most knowledgeable,
> > experienced, and prolific committer. =C2=A0So I suspect you've a (broke=
n)
> > custom script, or a cron job that's doing something evil, or some
> > other weirdness that is specific to your installations, but you
> > haven't provided enough details to speculate in detail (for example,
> > perhaps you could reply to the list and post a copy of the script you
> > think is doing this).
>
> Well, I have to disclaim credit/blame for this, because Red Hat has
> never shipped PG 8.3.anything for RHEL-5. =C2=A0Possibly the OP is running
> Devrim's or Command Prompt's RPMs. =C2=A0That said, the initscript Devrim
> uses looks just about like mine, and there's no chance whatever that it
> would selectively delete portions of what's under $PGDATA. =C2=A0I have to
> think that there's a loose cannon somewhere else in the OP's system.
> We have for example seen some very unfortunate behavior in the past
> when the data directory was located on a slow-to-mount NFS server.
> (I have no reason to think that that's exactly what this problem is;
> I just cite it to illustrate the kind of thing to be looking for.)
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0regards, tom lane


On Thu, Aug 5, 2010 at 2:46 PM, Nacho Mezzadra <nachomezzadra@gmail.com> wr=
ote:
>
> The following bug has been logged online:
>
> Bug reference: =C2=A0 =C2=A0 =C2=A05603
> Logged by: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Nacho Mezzadra
> Email address: =C2=A0 =C2=A0 =C2=A0nachomezzadra@gmail.com
> PostgreSQL version: 8.3.11
> Operating system: =C2=A0 Red Hat Enterprise 5.3
> Description: =C2=A0 =C2=A0 =C2=A0 =C2=A0pg_tblspc and pg_twoface director=
ies get deleted when
> starting up service
> Details:
>
> This issue happened not very frequently, but it happened to me 3 times, i=
n 3
> different Red Hat servers.
> The thing is that when stopping the Postgresql service with the
> "/sbin/service postgresql-8.3 stop" command, and after that starting it w=
ith
> the "/sbin/service postgresql-8.3 start" command (haven't tried with the
> restart one though), a few times both pg_tblspc and pg_twoface =C2=A0dire=
ctories
> (inside data directory) get somehow deleted and hence the start service
> command fails. =C2=A0Looking in the log files I find the following error:
>
> 2010-07-19 16:54:55 ISTFATAL: =C2=A0could not open directory "pg_tblspc":=
 No such
> file or directory
>
> So I manually create the "pg_tblspc" directory, and then try to start aga=
in
> the service unsuccessfully, getting this time a similar error, but saying
> that pg_twoface directory doesn't exist.
>
> After creating the pg_twoface directory, service can be successfully
> started.
>
> Please note that all these always happened running the service command as
> root.
> All 3 linux boxes are running over a VMWare host.

Re: BUG #5603: pg_tblspc and pg_twoface directories get deleted when starting up service

От
Tom Lane
Дата:
Nacho Mezzadra <nachomezzadra@gmail.com> writes:
> Tom, Robert, sorry I am coming back to you after a while, but we still
> have the same issue.  This has been happening in our environments, but
> now it is also happening in customers' environments -which we do not
> set up- and it is also happening.  All environments are always Red Hat
> Enterprise 5.3.

You still haven't given any reason to think this is a Postgres bug,
nor indeed any information beyond what you said originally.

One thing that strikes me is that both pg_tblspc and pg_twophase are
empty and unused during normal operation (if you're not using the
relevant features).  They are scanned during postmaster startup though,
which is why you're getting failures then.  I suspect that these
subdirectories are not in fact getting removed during PG shutdown or
restart, but were deleted some time before that.  In particular I wonder
if somebody's loosed an overaggressive tmp-file-cleaning script on your
whole filesystem.  Something that was removing empty directories that
hadn't been accessed in awhile could explain this.

            regards, tom lane