Re: [HACKERS] emergency outage requiring database restart

Поиск

Список

Период

Сортировка

От	Merlin Moncure
Тема	Re: [HACKERS] emergency outage requiring database restart
Дата	18 января 2017 г. 17:33:50
Msg-id	CAHyXU0ypCaDJMJ78H6EdKztZeh5oEkGu+j5HpwmfzOpWB4q1zg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] emergency outage requiring database restart (Ants Aasma <ants.aasma@eesti.ee>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jan 18, 2017 at 4:11 AM, Ants Aasma <ants.aasma@eesti.ee> wrote:
> On Wed, Jan 4, 2017 at 5:36 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Still getting checksum failures.   Over the last 30 days, I see the
>> following.  Since enabling checksums FWICT none of the damage is
>> permanent and rolls back with the transaction.   So creepy!
>
> The checksums still only differ in least significant digits which
> pretty much means that there is a block number mismatch. So if you
> rule out filesystem not doing its job correctly and transposing
> blocks, it could be something else that is resulting in blocks getting
> read from a location that happens to differ by a small multiple of
> page size. Maybe somebody is racily mucking with table fd's between
> seeking and reading. That would explain the issue disappearing after a
> retry.
>
> Maybe you can arrange for the RelFileNode and block number to be
> logged for the checksum failures and check what the actual checksums
> are in data files surrounding the failed page. If the requested block
> number contains something completely else, but the page that follows
> contains the expected checksum value, then it would support this
> theory.

will do.   Main challenge is getting hand compiled server to swap in
so that libdir continues to work.  Getting access to the server is
difficult as is getting a maintenance window.  I'll post back ASAP.

merlin

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] emergency outage requiring database restart