Re: pg_rewind WAL segments deletion pitfall
От | torikoshia |
---|---|
Тема | Re: pg_rewind WAL segments deletion pitfall |
Дата | |
Msg-id | 41e7c31b85aad03a4a2cd14daad31acd@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: pg_rewind WAL segments deletion pitfall (Michael Paquier <michael@paquier.xyz>) |
Список | pgsql-hackers |
On 2023-08-22 14:32, Michael Paquier wrote: Thanks for your review! > On Fri, Aug 18, 2023 at 03:40:57PM +0900, torikoshia wrote: >> Thanks for the patch, I've marked this as ready-for-committer. >> >> BTW, this issue can be considered a bug, right? >> I think it would be appropriate to provide backpatch. > > Hmm, I agree that there is a good argument in back-patching as we have > the WAL files between the redo LSN and the divergence LSN, but > pg_rewind is not smart enough to keep them around. If the archives of > the primary were not able to catch up, the old primary is as good as > kaput, and restore_command won't help here. True. I also imagine that in the typical failover scenario where the target cluster was shut down soon after the divergence and pg_rewind was executed without much time, we can avoid this kind of 'requested WAL segment has already removed' error by preventing pg_rewind from deleting necessary WALs. > I don't like much this patch. While it takes correctly advantage of > the backward record read logic from SimpleXLogPageRead() able to > handle correctly timeline jumps, it creates a hidden dependency in the > code between the hash table from filemap.c and the page callback. > Wouldn't it be simpler to build a list of the segment names using the > information from WALOpenSegment and build this list in > findLastCheckpoint()? Also, I am wondering if we should be smarter > with any potential conflict handling between the source and the > target, rather than just enforcing a FILE_ACTION_NONE for all these > files. In short, could it be better to copy the WAL file from the > source if it exists there? > > + /* > + * Some entries (WAL segments) already have an action assigned > + * (see SimpleXLogPageRead()). > + */ > + if (entry->action == FILE_ACTION_UNDECIDED) > + entry->action = decide_file_action(entry); > > This change makes me a bit uneasy, per se my previous comment with the > additional code dependencies. > > I think that this scenario deserves a test case. If one wants to > emulate a delay in WAL archiving, it is possible to set > archive_command to a command that we know will fail, for instance. > -- > Michael Bungina, are you going to respond to these comments? -- Regards, -- Atsushi Torikoshi NTT DATA Group Corporation
В списке pgsql-hackers по дате отправления: