Обсуждение: BUG #13010: After promote postgres try to send old timeline WALs to archive

Поиск
Список
Период
Сортировка

BUG #13010: After promote postgres try to send old timeline WALs to archive

От
eshkinkot@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      13010
Logged by:          Sergey Burladyan
Email address:      eshkinkot@gmail.com
PostgreSQL version: 9.2.10
Operating system:   Slackware 14.1
Description:

Hello, I have a problem with WAL archiving after promote.

I do not use streaming, only WAL archive and after promote standby
it try to archive WALs with old timeline, but I already have it in
archive (from old master) and at new master WAL archiving stopped.

pg_xlog from standby before promote:
pg_xlog/
|-- 000000010000000000000014
|-- 000000010000000000000015
|-- 000000010000000000000016
|-- 000000010000000000000017
|-- 000000010000000000000018
|-- 000000010000000000000019
|-- 00000001000000000000001A
|-- 00000001000000000000001B
|-- 00000001000000000000001C
|-- 00000001000000000000001D
|-- 00000001000000000000001E
|-- 00000001000000000000001F
|-- 000000010000000000000020
|-- 000000010000000000000021
`-- archive_status
    |-- 000000010000000000000014.done
    |-- 000000010000000000000015.done
    |-- 000000010000000000000016.done
    |-- 000000010000000000000017.done
    |-- 000000010000000000000018.done
    |-- 000000010000000000000019.done
    |-- 00000001000000000000001A.done
    |-- 00000001000000000000001B.done
    |-- 00000001000000000000001C.done
    |-- 00000001000000000000001D.done
    |-- 00000001000000000000001E.done
    |-- 00000001000000000000001F.done
    |-- 000000010000000000000020.done
    `-- 000000010000000000000021.done

pg_xlog from standby after promote it:
pg_xlog/
|-- 000000010000000000000020
|-- 000000010000000000000021
|-- 000000010000000000000022
|-- 000000010000000000000023
|-- 000000010000000000000024
|-- 000000010000000000000025
|-- 000000010000000000000026
|-- 000000010000000000000027
|-- 00000002.history
|-- 000000020000000000000024
|-- 000000020000000000000025
|-- 000000020000000000000026
|-- 000000020000000000000027
|-- 000000020000000000000028
|-- 000000020000000000000029
|-- 00000002000000000000002A
|-- 00000002000000000000002B
|-- 00000002000000000000002C
`-- archive_status
    |-- 000000010000000000000020.done
    |-- 000000010000000000000021.done
    |-- 000000010000000000000022.done
    |-- 000000010000000000000023.done
    |-- 000000010000000000000024.done
    `-- 00000002.history.done

pg_xlog later after promote:
pg_xlog/
|-- 000000010000000000000025
|-- 000000010000000000000026
|-- 000000010000000000000027
|-- 00000002.history
|-- 000000020000000000000028
|-- 000000020000000000000029
|-- 00000002000000000000002A
|-- 00000002000000000000002B
|-- 00000002000000000000002C
|-- 00000002000000000000002D
|-- 00000002000000000000002E
|-- 00000002000000000000002F
|-- 000000020000000000000030
|-- 000000020000000000000031
|-- 000000020000000000000032
|-- 000000020000000000000033
|-- 000000020000000000000034
`-- archive_status
    |-- 000000010000000000000025.ready
    |-- 000000010000000000000026.ready
    |-- 000000010000000000000027.ready
    |-- 00000002.history.done
    |-- 000000020000000000000028.done
    |-- 000000020000000000000029.done
    |-- 00000002000000000000002A.done
    |-- 00000002000000000000002B.done
    |-- 00000002000000000000002C.done
    `-- 00000002000000000000002D.ready

now WAL archiving stopped with messages:

2015-04-08 20:49:31 MSK LOG:  archive command failed with exit code 1
2015-04-08 20:49:31 MSK DETAIL:  The failed archive command was: test ! -f
~/tmp/pg-slave-switch/w/000000010000000000000025 && cp
pg_xlog/000000010000000000000025
~/tmp/pg-slave-switch/w/000000010000000000000025
2015-04-08 20:49:32 MSK LOG:  archive command failed with exit code 1

Re: BUG #13010: After promote postgres try to send old timeline WALs to archive

От
Michael Paquier
Дата:
On Thu, Apr 9, 2015 at 11:02 PM,  <eshkinkot@gmail.com> wrote:
> I do not use streaming, only WAL archive and after promote standby
> it try to archive WALs with old timeline, but I already have it in
> archive (from old master) and at new master WAL archiving stopped.
>
> [...]
> now WAL archiving stopped with messages:
>
> 2015-04-08 20:49:31 MSK LOG:  archive command failed with exit code 1
> 2015-04-08 20:49:31 MSK DETAIL:  The failed archive command was: test ! -f
> ~/tmp/pg-slave-switch/w/000000010000000000000025 && cp
> pg_xlog/000000010000000000000025
> ~/tmp/pg-slave-switch/w/000000010000000000000025
> 2015-04-08 20:49:32 MSK LOG:  archive command failed with exit code 1

The standby has recycled some WAL segments ahead thinking to reuse
them and at promotion they became actually bogus. It is an expected
behavior for a standby to archive the files that it thinks are not
archived yet after promotion even if they are not of its own timeline,
but those bogus segments should never be archived. See this thread for
example that has a patch:
http://www.postgresql.org/message-id/54942034.7080303@vmware.com
Note that I got this patch on my list-of-things-to-look-at for some
time, perhaps it is time to accelerate the move.
Regards,
--
Michael

Re: BUG #13010: After promote postgres try to send old timeline WALs to archive

От
Michael Paquier
Дата:
On Fri, Apr 10, 2015 at 2:16 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Apr 9, 2015 at 11:02 PM,  <eshkinkot@gmail.com> wrote:
>> I do not use streaming, only WAL archive and after promote standby
>> it try to archive WALs with old timeline, but I already have it in
>> archive (from old master) and at new master WAL archiving stopped.
>>
>> [...]
>> now WAL archiving stopped with messages:
>>
>> 2015-04-08 20:49:31 MSK LOG:  archive command failed with exit code 1
>> 2015-04-08 20:49:31 MSK DETAIL:  The failed archive command was: test ! -f
>> ~/tmp/pg-slave-switch/w/000000010000000000000025 && cp
>> pg_xlog/000000010000000000000025
>> ~/tmp/pg-slave-switch/w/000000010000000000000025
>> 2015-04-08 20:49:32 MSK LOG:  archive command failed with exit code 1
>
> The standby has recycled some WAL segments ahead thinking to reuse
> them and at promotion they became actually bogus.

FYI, a fix for this issue has been committed here:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b2a5545bd63fc94a71b1e97ecdd03c605d97a438
--
Michael