Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?
От | Heikki Linnakangas |
---|---|
Тема | Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? |
Дата | |
Msg-id | 28f5a502-8f27-94af-6d5a-ed939be87125@iki.fi обсуждение исходный текст |
Ответ на | Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)? (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Ответы |
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?
|
Список | pgsql-hackers |
On 24/04/18 13:57, Kyotaro HORIGUCHI wrote: > At Mon, 23 Apr 2018 03:41:47 -0400, Heikki Linnakangas <hlinnaka@iki.fi> wrote in <89e33d4f-5c75-0738-3dcb-44c4df59e920@iki.fi> >> Looking at the patch linked above >> (https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp): >> >>> --- a/src/backend/access/transam/xlog.c >>> +++ b/src/backend/access/transam/xlog.c >>> @@ -11693,6 +11693,10 @@ retry: >>> Assert(reqLen <= readLen); >>> *readTLI = curFileTLI; >>> + >>> + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, >>> readBuf)) >>> + goto next_record_is_invalid; >>> + >>> return readLen; >>> next_record_is_invalid: >> >> What is the point of adding this XLogReaderValidatePageHeader() call? >> The caller of this callback function, ReadPageInternal(), checks the >> page header anyway. Earlier in this thread, you said: > > Without the lines, server actually fails to start replication. > > (I try to remember the details...) > > The difference is whether the function can cause retry for the > same portion of a set of continued records (without changing the > plugin API). XLogPageRead can do that. On the other hand all > ReadPageInternal can do is just letting the caller ReadRecords > retry from the beginning of the set of continued records since > the caller side handles only complete records. > > In the failure case, in XLogPageRead, when read() fails, it can > try the next source midst of a continued records. On the other > hand if the segment was read but it was recycled one, it passes > "success" to ReadPageInternal and leads to retring from the > beginning of the recrod. Infinite loop. I see. You have the same problem if you have a WAL file that's corrupt in some other way, but one of the sources would have a correct copy. ValidXLogPageHeader() only checks the page header, after all. But unlike a recycled WAL segment, that's not supposed to happen as part of normal operation, so I guess we can live with that. > Calling XLogReaderValidatePageHeader in ReadPageInternal is > redundant, but removing it may be interface change of xlogreader > plugin and I am not sure that is allowed. We should avoid doing that in back-branches, at least. But in 'master', I wouldn't mind redesigning the API. Dealing with all the retrying is pretty complicated as it is, if we can simplify that somehow, I think that'd be good. - Heikki
В списке pgsql-hackers по дате отправления: