Re: Corruption during WAL replay

Поиск

Список

Период

Сортировка

От	Masahiko Sawada
Тема	Re: Corruption during WAL replay
Дата	13 апреля 2020 г. 09:24:55
Msg-id	CA+fd4k43sPrf3sRZmUHpDErHPfkbfa3rhQ9Y98jTCUzj9Y1VaQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Corruption during WAL replay (Teja Mupparti <tejeswarm@hotmail.com>)
Ответы	Re: Corruption during WAL replay (Andres Freund <andres@anarazel.de>)
Список	pgsql-hackers

Дерево обсуждения

On Sat, 11 Apr 2020 at 09:00, Teja Mupparti <tejeswarm@hotmail.com> wrote:
>
> Thanks Andres and Kyotaro for the quick review.  I have fixed the typos and also included the critical section
(emulatedit with try-catch block since palloc()s are causing issues in the truncate code). This time I used git
format-patch.
>

I briefly looked at the latest patch but I'm not sure it's the right
thing here to use PG_TRY/PG_CATCH to report the PANIC error. For
example, with the following code you changed, we will always end up
with emitting a PANIC "failed to truncate the relation" regardless of
the actual cause of the error.

+   PG_CATCH();
+   {
+       ereport(PANIC, (errcode(ERRCODE_INTERNAL_ERROR),
+           errmsg("failed to truncate the relation")));
+   }
+   PG_END_TRY();

And the comments of RelationTruncate() mentions:

/*
 * We WAL-log the truncation before actually truncating, which means
 * trouble if the truncation fails. If we then crash, the WAL replay
 * likely isn't going to succeed in the truncation either, and cause a
 * PANIC. It's tempting to put a critical section here, but that cure
 * would be worse than the disease. It would turn a usually harmless
 * failure to truncate, that might spell trouble at WAL replay, into a
 * certain PANIC.
 */

As a second idea, I wonder if we can defer truncation until commit
time like smgrDoPendingDeletes mechanism. The sequence would be:

At RelationTruncate(),
1. WAL logging.
2. Remember buffers to be dropped.

At CommitTransaction(),
3. Revisit the remembered buffers to check if the buffer still has
table data that needs to be truncated.
4-a, If it has, we mark it as IO_IN_PROGRESS.
4-b, If it already has different table data, ignore it.
5, Truncate physical files.
6, Mark the buffer we marked at #4-a as invalid.

If an error occurs between #3 and #6 or in abort case, we revert all
IO_IN_PROGRESS flags on the buffers.

In the above idea, remembering all buffers having to-be-truncated
table at RelationTruncate(), we reduce the time for checking buffers
at the commit time. Since we acquire AccessExclusiveLock the number of
buffers having to-be-truncated table's data never increases. A
downside would be that since we can truncate multiple relations we
need to remember all buffers of each truncated relations, which is up
to (sizeof(int) * NBuffers) in total.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Kapila
Дата: 13 апреля 2020 г., 09:10:51
Сообщение: Re: WAL usage calculation patch

Следующее

От: Kyotaro Horiguchi
Дата: 13 апреля 2020 г., 09:31:01
Сообщение: Re: Race condition in SyncRepGetSyncStandbysPriority

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Corruption during WAL replay

Предыдущее

Следующее