Re: Corruption during WAL replay

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Corruption during WAL replay
Дата
Msg-id 20200413084031.krnrhh74qtywsz2f@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Corruption during WAL replay  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Ответы Re: Corruption during WAL replay  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Список pgsql-hackers
Hi,

On 2020-04-13 15:24:55 +0900, Masahiko Sawada wrote:
> On Sat, 11 Apr 2020 at 09:00, Teja Mupparti <tejeswarm@hotmail.com> wrote:
> >
> > Thanks Andres and Kyotaro for the quick review.  I have fixed the typos and also included the critical section
(emulatedit with try-catch block since palloc()s are causing issues in the truncate code). This time I used git
format-patch.
> >
> 
> I briefly looked at the latest patch but I'm not sure it's the right
> thing here to use PG_TRY/PG_CATCH to report the PANIC error. For
> example, with the following code you changed, we will always end up
> with emitting a PANIC "failed to truncate the relation" regardless of
> the actual cause of the error.
> 
> +   PG_CATCH();
> +   {
> +       ereport(PANIC, (errcode(ERRCODE_INTERNAL_ERROR),
> +           errmsg("failed to truncate the relation")));
> +   }
> +   PG_END_TRY();
> 
> And the comments of RelationTruncate() mentions:

I think that's just a workaround for mdtruncate not being usable in
critical sections.


> /*
>  * We WAL-log the truncation before actually truncating, which means
>  * trouble if the truncation fails. If we then crash, the WAL replay
>  * likely isn't going to succeed in the truncation either, and cause a
>  * PANIC. It's tempting to put a critical section here, but that cure
>  * would be worse than the disease. It would turn a usually harmless
>  * failure to truncate, that might spell trouble at WAL replay, into a
>  * certain PANIC.
>  */

Yea, but that reasoning is just plain *wrong*. It's *never* ok to WAL
log something and then not perform the action. This leads to to primary
/ replica getting out of sync, crash recovery potentially not completing
(because of records referencing the should-be-truncated pages), ...


> As a second idea, I wonder if we can defer truncation until commit
> time like smgrDoPendingDeletes mechanism. The sequence would be:

This is mostly an issue during [auto]vacuum partially truncating the end
of the file. We intentionally release the AEL regularly to allow other
accesses to continue.

For transactional truncations we don't go down this path (as we create a
new relfilenode).


> At RelationTruncate(),
> 1. WAL logging.
> 2. Remember buffers to be dropped.

You definitely cannot do that, as explained above.


> At CommitTransaction(),
> 3. Revisit the remembered buffers to check if the buffer still has
> table data that needs to be truncated.
> 4-a, If it has, we mark it as IO_IN_PROGRESS.
> 4-b, If it already has different table data, ignore it.
> 5, Truncate physical files.
> 6, Mark the buffer we marked at #4-a as invalid.
> 
> If an error occurs between #3 and #6 or in abort case, we revert all
> IO_IN_PROGRESS flags on the buffers.

What would this help with? If we still need the more complicated
truncation sequence?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: tushar
Дата:
Сообщение: Re: [Proposal] Global temporary tables
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: pg_basebackup, manifests and backends older than ~12