Re: Corruption during WAL replay

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: Corruption during WAL replay
Дата	13 апреля 2020 г. 08:40:31
Msg-id	20200413084031.krnrhh74qtywsz2f@alap3.anarazel.de обсуждение исходный текст
Ответ на	Re: Corruption during WAL replay (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Ответы	Re: Corruption during WAL replay
Список	pgsql-hackers

Дерево обсуждения

Hi,

On 2020-04-13 15:24:55 +0900, Masahiko Sawada wrote:
> On Sat, 11 Apr 2020 at 09:00, Teja Mupparti <tejeswarm@hotmail.com> wrote:
> >
> > Thanks Andres and Kyotaro for the quick review.  I have fixed the typos and also included the critical section
(emulatedit with try-catch block since palloc()s are causing issues in the truncate code). This time I used git
format-patch.
> >
> 
> I briefly looked at the latest patch but I'm not sure it's the right
> thing here to use PG_TRY/PG_CATCH to report the PANIC error. For
> example, with the following code you changed, we will always end up
> with emitting a PANIC "failed to truncate the relation" regardless of
> the actual cause of the error.
> 
> +   PG_CATCH();
> +   {
> +       ereport(PANIC, (errcode(ERRCODE_INTERNAL_ERROR),
> +           errmsg("failed to truncate the relation")));
> +   }
> +   PG_END_TRY();
> 
> And the comments of RelationTruncate() mentions:

I think that's just a workaround for mdtruncate not being usable in
critical sections.


> /*
>  * We WAL-log the truncation before actually truncating, which means
>  * trouble if the truncation fails. If we then crash, the WAL replay
>  * likely isn't going to succeed in the truncation either, and cause a
>  * PANIC. It's tempting to put a critical section here, but that cure
>  * would be worse than the disease. It would turn a usually harmless
>  * failure to truncate, that might spell trouble at WAL replay, into a
>  * certain PANIC.
>  */

Yea, but that reasoning is just plain *wrong*. It's *never* ok to WAL
log something and then not perform the action. This leads to to primary
/ replica getting out of sync, crash recovery potentially not completing
(because of records referencing the should-be-truncated pages), ...


> As a second idea, I wonder if we can defer truncation until commit
> time like smgrDoPendingDeletes mechanism. The sequence would be:

This is mostly an issue during [auto]vacuum partially truncating the end
of the file. We intentionally release the AEL regularly to allow other
accesses to continue.

For transactional truncations we don't go down this path (as we create a
new relfilenode).


> At RelationTruncate(),
> 1. WAL logging.
> 2. Remember buffers to be dropped.

You definitely cannot do that, as explained above.


> At CommitTransaction(),
> 3. Revisit the remembered buffers to check if the buffer still has
> table data that needs to be truncated.
> 4-a, If it has, we mark it as IO_IN_PROGRESS.
> 4-b, If it already has different table data, ignore it.
> 5, Truncate physical files.
> 6, Mark the buffer we marked at #4-a as invalid.
> 
> If an error occurs between #3 and #6 or in abort case, we revert all
> IO_IN_PROGRESS flags on the buffers.

What would this help with? If we still need the more complicated
truncation sequence?

Greetings,

Andres Freund

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Corruption during WAL replay