Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size

Поиск

Список

Период

Сортировка

От	Kyotaro Horiguchi
Тема	Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Дата	29 июля 2021 г. 07:20:38
Msg-id	20210729.162038.534808353849568395.horikyota.ntt@gmail.com обсуждение исходный текст
Ответ на	Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Список	pgsql-bugs

Дерево обсуждения

At Wed, 28 Jul 2021 11:38:28 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> This test is still unstable :-(
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-20%2012%3A46%3A11
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-20%2015%3A05%3A39
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01
> 
> These all look like
> 
> #   Failed test 'check that segments have been removed'
> #   at t/019_replslot_limit.pl line 226.
> #          got: '000000010000000000000020'
> #     expected: '000000010000000000000024'
> # Looks like you failed 1 test of 16.
> 
> with varying values mentioned.  It looks to me like WAL file cleanup
> is not as synchronous with slot creation as the test thinks.
> Maybe it needs to loop until the oldest WAL file matches what it
> expects?

Sorry for the kludge.

Mmm. In the failure cases, directory scan(@16:52:22.036) runs before
the targetted checkpoint completes(@16:52:22.144).

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01
16:52:17.328 LOG:  checkpoint starting: wal
16:52:19.140 LOG:  invalidating slot "rep1" because its restart_lsn 0/1D00000 exceeds max_slot_wal_keep_size
16:52:19.316 019_replslot_limit.pl LOG:  statement: SELECT pg_walfile_name(lsn) FROM
pg_create_physical_replication_slot('s2',true)
 
16:52:22.036 019_replslot_limit.pl LOG:  statement: SELECT pg_ls_dir AS f FROM pg_ls_dir('pg_wal') WHERE pg_ls_dir ~
'^[0-9A-F]{24}$'ORDER BY 1 LIMIT
 
16:52:22.077 019_replslot_limit.pl LOG:  statement: SELECT pg_drop_replication_slot('s2')
16:52:22.144 LOG:  checkpoint complete: wrote 18 buffers (14.1%); 0 WAL file(s) added, 4 removed, 3 recycled;
write=1.806s, sync=0.001 s, total=4.817 s; sync files=0, longest=0.000 s, average=0.000 s; distance=3072 kB,
estimate=3072kB
 

The reason is the previous checkpoint completed after starting to
advance segments

> my $logstart = get_log_size($node_primary);
> advance_wal($node_primary, 7);   !!!! another checkpoint runs/ends
...
16:52:19.140  # check for "invalidate slots" in log.
              # check for the "lost" state in pg_replication_slots.
             !! checkfor "checkpint complete" in log.
16:52:22.077  # read redo segment and oldest wal.
16:52:22.144 !! The target checkpoint ends.

So it seems to me we need to explicitly prevent unexpected checkpoints
from happening maybe by enlarging max_wal_size temporily.

I'll going that way.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size