Re: Minimal logical decoding on standbys
От | Drouvot, Bertrand |
---|---|
Тема | Re: Minimal logical decoding on standbys |
Дата | |
Msg-id | 26c6f320-98f0-253c-f8b5-df1e7c1f6315@amazon.com обсуждение исходный текст |
Ответ на | Re: Minimal logical decoding on standbys (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys
|
Список | pgsql-hackers |
Hi Andres, On 4/8/21 5:47 AM, Andres Freund wrote: > Hi, > > On 2021-04-07 13:32:18 -0700, Andres Freund wrote: >> While working on this I found a, somewhat substantial, issue: >> >> When the primary is idle, on the standby logical decoding via walsender >> will typically not process the records until further WAL writes come in >> from the primary, or until a 10s lapsed. >> >> The problem is that WalSndWaitForWal() waits for the *replay* LSN to >> increase, but gets woken up by walreceiver when new WAL has been >> flushed. Which means that typically walsenders will get woken up at the >> same time that the startup process will be - which means that by the >> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely >> that the startup process already replayed the record and updated >> XLogCtl->lastReplayedEndRecPtr. >> >> I think fixing this would require too invasive changes at this point. I >> think we might be able to live with 10s delay issue for one release, but >> it sure is ugly :(. > This is indeed pretty painful. It's a lot more regularly occuring if you > either have a slot disk, or you switch around the order of > WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush(). > > - There's about which timeline to use. If you use pg_recvlogical and you > restart the server, you'll see errors like: > > pg_recvlogical: error: unexpected termination of replication stream: ERROR: requested WAL segment 000000000000000000000003has already been removed > > the real filename is 000000010000000000000003 - i.e. the timeline is > 0. > > This isn't too hard to fix, but definitely needs fixing. Thanks, nice catch! From what I have seen, we are not going through InitXLOGAccess() on a Standby and in some cases (like the one you mentioned) StartLogicalReplication() is called without IdentifySystem() being called previously: this lead to ThisTimeLineID still set to 0. I am proposing a fix in the attached v18 by adding a check in StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved. > > - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially > leading us to drop a slot that has been created since we signalled a > recovery conflict. See > https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de > for some very similar issues. I have rewritten this part by following the same logic as the one used in 96540f80f8 (the commit linked to the thread you mentioned). > > - Given the precedent of max_slot_wal_keep_size, I think it's wrong to > just drop the logical slots. Instead we should just mark them as > invalid, like InvalidateObsoleteReplicationSlots(). Makes fully sense and done that way in the attached patch. I am setting the slot's data.xmin and data.catalog_xmin as InvalidTransactionId to mark the slot(s) as invalid in case of conflict. > - There's no tests covering timeline switches, what happens if there's a > promotion if logical decoding is currently ongoing. I'll now work on the tests. > > - The way ResolveRecoveryConflictWithLogicalSlots() builds the error > message is not good (and I've complained about it before...). I changed it and made it more simple. I also removed the details around mentioning xmin or catalog xmin (as I am not sure of the added value and they are currently also not mentioned during standby recovery snapshot conflict). > > Unfortunately I think the things I have found are too many for me to > address within the given time. I'll send a version with a somewhat > polished set of the changes I made in the next few days... Thanks for the review and feedback. Please find enclosed v18 with the changes I worked on. I still need to have a look on the tests. There is also the 10s delay to work on, do you already have an idea on how we should handle it? Thanks Bertrand
Вложения
В списке pgsql-hackers по дате отправления: