Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
От | Kyotaro Horiguchi |
---|---|
Тема | Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size |
Дата | |
Msg-id | 20210729.162038.534808353849568395.horikyota.ntt@gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size |
Список | pgsql-bugs |
At Wed, 28 Jul 2021 11:38:28 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > This test is still unstable :-( > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tern&dt=2021-07-20%2012%3A46%3A11 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-20%2015%3A05%3A39 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 > > These all look like > > # Failed test 'check that segments have been removed' > # at t/019_replslot_limit.pl line 226. > # got: '000000010000000000000020' > # expected: '000000010000000000000024' > # Looks like you failed 1 test of 16. > > with varying values mentioned. It looks to me like WAL file cleanup > is not as synchronous with slot creation as the test thinks. > Maybe it needs to loop until the oldest WAL file matches what it > expects? Sorry for the kludge. Mmm. In the failure cases, directory scan(@16:52:22.036) runs before the targetted checkpoint completes(@16:52:22.144). https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-28%2014%3A33%3A01 16:52:17.328 LOG: checkpoint starting: wal 16:52:19.140 LOG: invalidating slot "rep1" because its restart_lsn 0/1D00000 exceeds max_slot_wal_keep_size 16:52:19.316 019_replslot_limit.pl LOG: statement: SELECT pg_walfile_name(lsn) FROM pg_create_physical_replication_slot('s2',true) 16:52:22.036 019_replslot_limit.pl LOG: statement: SELECT pg_ls_dir AS f FROM pg_ls_dir('pg_wal') WHERE pg_ls_dir ~ '^[0-9A-F]{24}$'ORDER BY 1 LIMIT 16:52:22.077 019_replslot_limit.pl LOG: statement: SELECT pg_drop_replication_slot('s2') 16:52:22.144 LOG: checkpoint complete: wrote 18 buffers (14.1%); 0 WAL file(s) added, 4 removed, 3 recycled; write=1.806s, sync=0.001 s, total=4.817 s; sync files=0, longest=0.000 s, average=0.000 s; distance=3072 kB, estimate=3072kB The reason is the previous checkpoint completed after starting to advance segments > my $logstart = get_log_size($node_primary); > advance_wal($node_primary, 7); !!!! another checkpoint runs/ends ... 16:52:19.140 # check for "invalidate slots" in log. # check for the "lost" state in pg_replication_slots. !! checkfor "checkpint complete" in log. 16:52:22.077 # read redo segment and oldest wal. 16:52:22.144 !! The target checkpoint ends. So it seems to me we need to explicitly prevent unexpected checkpoints from happening maybe by enlarging max_wal_size temporily. I'll going that way. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
В списке pgsql-bugs по дате отправления: