Обсуждение: Bug (#3484) - Missing pg_clog/0AE6

Поиск
Список
Период
Сортировка

Bug (#3484) - Missing pg_clog/0AE6

От
Alexandra Nitzschke
Дата:
We did some analyzing and found out some mysterious things.

First we used the pg_check tool to analyze the table.
The result is:

# select pgcheck_page( 211593798 );
WARNING:  relation 'adresse_080103', tuple (46,11): start of tuple is not aligned properly
WARNING:  relation 'adresse_080103', tuple (46,11): has t_xmax < t_xmin
   pgcheck_page
--------------
           9808
(1 row)

# select pgcheck_index_full( 211593798 );
NOTICE:  name of tested relation: 'adresse_080103':
NOTICE:  relation 'adresse_080103' doesn't have index
   pgcheck_index_full
--------------------
            211593798
(1 row)

So we looked up the row: SELECT * FROM public.adresse_080103 where ctid = '(46,10)'::tid;
and did an pg_filedump on the database file and searched for the data of that recordset.
After that recordset we found corrupted data!
It seems to be corrupted in the same way as the error before.
The dump of the corrupted data is appended. ( We replaced some sensitive data with '*' at the beinning and the end  )

We went on with analyzing:
- the table was created at 2008/01/03 17:56h
- the nightly dump started at 2008/01/03 22:00h
- it tried to copy the table 'adresse_080103' at 22:00:08
- the dump crashed at 22:32:10 ( because of the error we reported 2007/12/14; we repaired the database not till
2008/01/11) 

The stat of the database file returns this:
    File: "/postgres/database/data/base/23144/211593798"
    Size: 1835008         Blocks: 3592       IO Block: 4096   reguläre Datei
Device: 811h/2065d      Inode: 18121638    Links: 1
Access: (0600/-rw-------)  Uid: ( 1001/postgres)   Gid: (    2/  daemon)
Access: 2008-02-15 18:19:44.000000000 +0100
Modify: 2008-01-03 22:00:34.000000000 +0100
Change: 2008-01-03 22:00:34.000000000 +0100

We are wondering, that the pg_dump seems to have modified the file.

But we think, that the file couldn't be corrupted at this time.
Each night from 2008/01/11 (the repair day) up to 2008/02/01 the nightly dump ran successful.
Suddenly at 2008/02/02 it crashed.
It seems that something corrupted the data without changing the modification time of the file.
At the moment we have no idea, how this situation comes.

Could you please answer, if the pg_dump modifies the access timestamp in some cases?

Thanks in advance,

       Alex

Re: Bug (#3484) - Missing pg_clog/0AE6

От
Zdenek Kotala
Дата:
Alexandra Nitzschke napsal(a):

<snip>

>
> We went on with analyzing:
> - the table was created at 2008/01/03 17:56h
> - the nightly dump started at 2008/01/03 22:00h
> - it tried to copy the table 'adresse_080103' at 22:00:08
> - the dump crashed at 22:32:10 ( because of the error we reported
> 2007/12/14; we repaired the database not till 2008/01/11 )
>
> The stat of the database file returns this:
>    File: "/postgres/database/data/base/23144/211593798"
>    Size: 1835008         Blocks: 3592       IO Block: 4096   reguläre Datei
> Device: 811h/2065d      Inode: 18121638    Links: 1
> Access: (0600/-rw-------)  Uid: ( 1001/postgres)   Gid: (    2/  daemon)
> Access: 2008-02-15 18:19:44.000000000 +0100
> Modify: 2008-01-03 22:00:34.000000000 +0100
> Change: 2008-01-03 22:00:34.000000000 +0100
>
> We are wondering, that the pg_dump seems to have modified the file.
>

<snip>

>
> Could you please answer, if the pg_dump modifies the access timestamp in
> some cases?
>

Just a idea that pg_dump invoked checkpoint but I don't expect that
table data spent four hour in a buffer cache. Especially in case when
max checkpoint_timeout is one hour.

        Zdenek