RE: Random pg_upgrade test failure on drongo
От | Hayato Kuroda (Fujitsu) |
---|---|
Тема | RE: Random pg_upgrade test failure on drongo |
Дата | |
Msg-id | TY3PR01MB98894D8BE99AE53217C96C0AF56A2@TY3PR01MB9889.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Random pg_upgrade test failure on drongo (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Random pg_upgrade test failure on drongo
|
Список | pgsql-hackers |
Dear Amit, Alexander, > > We get the effect discussed when the background writer process decides to > > flush a file buffer for pg_largeobject during stage 1. > > (Thus, if a checkpoint somehow happened to occur during CREATE DATABASE, > > the result must be the same.) > > And another important factor is shared_buffers = 1MB (set during the test). > > With the default setting of 128MB I couldn't see the failure. > > > > It can be reproduced easily (on old Windows versions) just by running > > pg_upgrade in a loop (I've got failures on iterations 22, 37, 17 (with the > > default cluster)). > > If an old cluster contains dozen of databases, this increases the failure > > probability significantly (with 10 additional databases I've got failures > > on iterations 4, 1, 6). > > > > I don't have an old Windows environment to test but I agree with your > analysis and theory. The question is what should we do for these new > random BF failures? I think we should set bgwriter_lru_maxpages to 0 > and checkpoint_timeout to 1hr for these new tests. Doing some invasive > fix as part of this doesn't sound reasonable because this is an > existing problem and there seems to be another patch by Thomas that > probably deals with the root cause of the existing problem [1] as > pointed out by you. > > [1] - https://commitfest.postgresql.org/40/3951/ Based on the suggestion by Amit, I have created a patch with the alternative approach. This just does GUC settings. The reported failure is only for 003_logical_slots, but the patch also includes changes for the recently added test, 004_subscription. IIUC, there is a possibility that 004 would fail as well. Per our understanding, this patch can stop random failures. Alexander, can you test for the confirmation? Best Regards, Hayato Kuroda FUJITSU LIMITED
Вложения
В списке pgsql-hackers по дате отправления: