Обсуждение: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

Поиск
Список
Период
Сортировка

BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

От
thomas.eckestad@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      7634
Logged by:          Thomas
Email address:      thomas.eckestad@gmail.com
PostgreSQL version: 9.1.6
Operating system:   Linux
Description:        =


Hi,

We are using a Postgres server dedicated for unit testing, i.e. for testing
our code interacting with the database. Each unit test may create, use and
then drop one or more test databases. When running the complete test suite a
lot of databases are created and dropped (>100).

After a couple of days/weeks with frequent unit test activity DROP DATABASE
eventually triggers errors on the following form:

2012-05-08 08:53:02.512 CEST> LOG:  statement: DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"
2012-05-08 08:53:02.512 CEST> ERROR:  could not open file "global/12693": No
such file or directory
2012-05-08 08:53:02.512 CEST> STATEMENT:  DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"

For now we handle this situation by automatically performing a complete
reinstall of the test database server when we detect the error. So we have a
satisfactory workaround in place.

We are using PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc
(GCC) 4.3.4, 64-bit.

Best regards,
Thomas

Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

От
Tom Lane
Дата:
thomas.eckestad@gmail.com writes:
> After a couple of days/weeks with frequent unit test activity DROP DATABASE
> eventually triggers errors on the following form:

> 2012-05-08 08:53:02.512 CEST> LOG:  statement: DROP DATABASE IF EXISTS
> "HEAD_test_migrate_group_data_10010018668"
> 2012-05-08 08:53:02.512 CEST> ERROR:  could not open file "global/12693": No
> such file or directory
> 2012-05-08 08:53:02.512 CEST> STATEMENT:  DROP DATABASE IF EXISTS
> "HEAD_test_migrate_group_data_10010018668"

That is extremely peculiar --- AFAICS, 9.1 should never assign a
relfilenode of 12693.  (OIDs assigned by initdb don't get past about
11900 in that version, and OIDs assigned after normal postmaster start
should always be above 16384.)  Is it always exactly "global/12693"
that's complained of?  Could you monitor the contents of $PGDATA/global
and see if the set of filenames present changes while you're running
these tests?

            regards, tom lane

Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

От
Tom Lane
Дата:
thomas.eckestad@gmail.com writes:
> After a couple of days/weeks with frequent unit test activity DROP DATABASE
> eventually triggers errors on the following form:

> 2012-05-08 08:53:02.512 CEST> LOG:  statement: DROP DATABASE IF EXISTS
> "HEAD_test_migrate_group_data_10010018668"
> 2012-05-08 08:53:02.512 CEST> ERROR:  could not open file "global/12693": No
> such file or directory
> 2012-05-08 08:53:02.512 CEST> STATEMENT:  DROP DATABASE IF EXISTS
> "HEAD_test_migrate_group_data_10010018668"

FWIW, I ran about 40000 cycles of CREATE/DROP DATABASE on 9.1 branch tip
without seeing anything odd.  So it's fairly clear that there's
something you've not mentioned that's necessary to trigger this.

            regards, tom lane

Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

От
Thomas Eckestad
Дата:
2012/11/1 Tom Lane <tgl@sss.pgh.pa.us>

>
> That is extremely peculiar --- AFAICS, 9.1 should never assign a
> relfilenode of 12693.  (OIDs assigned by initdb don't get past about
> 11900 in that version, and OIDs assigned after normal postmaster start
> should always be above 16384.)  Is it always exactly "global/12693"
> that's complained of?  Could you monitor the contents of $PGDATA/global
> and see if the set of filenames present changes while you're running
> these tests?
>
>                         regards, tom lane
>

No, it is not always global/12693. A few days ago it was global/12589 that
got lost.

I am afraid that  I can not guarantee that the example that I posted
(global/12693) was triggered with version 9.1.6. It might be for 9.0.x or
9.1.x, if that makes a difference. I am sure though that global/12589 was
triggered using 9.1.5 (upgraded to 9.1.6 just a few days ago).

Sorry for the version confusion.

I will monitor global/ and try to trigger the bug and get back to you next
week.

Regards,
Thomas

Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

От
Tom Lane
Дата:
Thomas Eckestad <thomas.eckestad@gmail.com> writes:
> 2012/11/1 Tom Lane <tgl@sss.pgh.pa.us>
>> That is extremely peculiar --- AFAICS, 9.1 should never assign a
>> relfilenode of 12693.

> I am afraid that  I can not guarantee that the example that I posted
> (global/12693) was triggered with version 9.1.6. It might be for 9.0.x or
> 9.1.x, if that makes a difference. I am sure though that global/12589 was
> triggered using 9.1.5 (upgraded to 9.1.6 just a few days ago).

I realized that these numbers are actually quite a lot more
platform-specific than I'd been thinking, since in 9.1 they will vary
depending on how many OIDs got consumed for pg_collation entries,
and that will depend not only on your operating system but how many
locales you've seen fit to install.  So I'm probably wrong to have
guessed that this might represent a mistaken access to a relfilenode
that never should have existed.

What I'd suggest doing is monitoring the output of this query:

select relname, pg_relation_filenode(oid) from pg_class where relisshared;

which will tell you what filenames *ought* to be present in
$PGDATA/global, and then when something goes missing it'll be possible
to figure out which table or index it was.  That might provide at least
the first clue what's wrong.

            regards, tom lane