Recovery performance of standby for multiple concurrent truncateson large tables

Поиск
Список
Период
Сортировка
От Jamison, Kirk
Тема Recovery performance of standby for multiple concurrent truncateson large tables
Дата
Msg-id D09B13F772D2274BB348A310EE3027C62FD6E6@g01jpexmbkw24
обсуждение исходный текст
Ответы Re: Recovery performance of standby for multiple concurrenttruncates on large tables  (Andres Freund <andres@anarazel.de>)
Re: Recovery performance of standby for multiple concurrent truncateson large tables  (Ants Aasma <ants.aasma@eesti.ee>)
Список pgsql-hackers

Hello hackers,

 

Recently, the problem on improving performance of multiple drop/truncate tables in a single transaction with large shared_buffers (as shown below) was solved by commit b416691.

              BEGIN;

              truncate tbl001;

              ...

              truncate tbl050;

              COMMIT;

 

However, we have a customer that needs to execute multiple concurrent TRUNCATEs (40~50) on different large tables (as shown below) in a shorter amount of time. This one is not covered by the previous commit's improvement.

              BEGIN;

              truncate tbl001;

              COMMIT;

              ...

              BEGIN;

              truncate tbl050;

              COMMIT;


[Problem]

Currently, when the standby recovers the WAL of TRUNCATE/DROP TABLE, it leads to separate scans of the whole shared buffer in sequence to check whether or not the table to be deleted is cached in the shared buffer. Moreover, if the size of shared_buffers is large (i.e. 300GB) and the primary server fails during the replay, it would take a long while for the standby to complete recovery.

 

[Idea]

Since in the current implementation, the replay of each TRUNCATE/DROP TABLE scans the whole shared buffer.

One approach (though idea is not really developed yet) is to improve the recovery by delaying the shared buffer scan and invalidation (DropRelFileNodeBuffers) and to put it after the next checkpoint (after failover completion). The replay of TRUNCATE/DROP TABLE just make the checkpointer process remember what relations should be invalidated in the shared buffer during subsequent checkpoint. The checkpointer then scans the shared buffer only once to invalidate the buffers of relations that was dropped and truncated.

 

However, this is still a rough idea, so I am not sure if it’s feasible. I would like to know if the community has advice or other alternative solutions on how to work around this.

Any insights, advice, feedback?

Thank you in advance.

 

 

Regards,

Kirk Jamison

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christophe Courtois
Дата:
Сообщение: Re: Performance regression with PostgreSQL 11 and partitioning
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Recovery performance of standby for multiple concurrenttruncates on large tables