Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
От | Masahiko Sawada |
---|---|
Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Дата | |
Msg-id | CAD21AoC6zSkzpdmmeCUn1+YmcAw4pDc21OehtpDp0Rd=cF2TYw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Ответы |
Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
|
Список | pgsql-hackers |
On Mon, Jun 16, 2025 at 11:48 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Wed, Jun 11, 2025 at 2:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > I think it's the user's responsibility to keep at least one logical > > slot. It seems that setting wal_level to 'logical' would be the most > > reliable solution for this case. We might want to provide a way to > > keep 'logical' WAL level somehow but I don't have a good idea for now. > > > > Okay, Let me think more on this. > > > > > Considering cascading replication cases too, 2) could be tricky as > > cascaded standbys need to propagate the information of logical slot > > creation up to the most upstream server. > > > > Yes, I understand the challenges here. > > Thanks for the v2 patches, few concerns: Thank you for the comments! > 1) > Now when the slot on standby is invalidated due to effective_wal_level > switched back to replica and if we restart standby, it fails to > restart even if wal_level is explicitly changed to logical in conf > file. > > FATAL: logical replication slot "slot_st" exists, but logical > decoding is not enabled > HINT: Change "wal_level" to be "replica" or higher. Good catch, we should fix it. > > 2) > I see that when primary switches back its effective wal_level to > replica while standby has wal_level=logical in conf file, then standby > has this status: > > postgres=# show wal_level; > wal_level > ----------- > logical > > postgres=# show effective_wal_level; > effective_wal_level > --------------------- > replica > > Is this correct? Can effective_wal_level be < wal_level anytime? I > feel it can be greater but never lesser. Hmm, I think we need to define what value we should show in effective_wal_level on standbys because the standbys actually are not writing any WALs and whether or not the logical decoding is enabled on the standbys depends on the primary. In the previous version patch, the standby's effective_wal_level value depended solely on the standby's wal_level value. However, it was confusing in a sense because it's possible that the logical decoding could be available even though effective_wal_level is 'replica' if the primary already enables it. One idea is that given that the logical decoding availability and effective_wal_level value are independent in principle, it's better to provide a SQL function to get the logical decoding status so that users can check the logical decoding availability without checking effective_wal_level. With that function, it might make sense to revert back the behavior to the previous one. That is, on the primary the effective_wal_level value is always greater than or equal to wal_level whereas on the standbys it's always the same as wal_level, and users would be able to check the logical decoding availability using the SQL function. Or it might also be worth considering to show effective_wal_level as NULL on standbys. > > 3) > When standby invalidate obsolete slots due to effective_wal_level on > primary changed to replica, it dumps below: > LOG: invalidating obsolete replication slot "slot_st2" > DETAIL: Logical decoding on standby requires "wal_level" >= "logical" > on the primary server > > Shall we update this message as well to convey about slot-presence on primary. > DETAIL: Logical decoding on standby requires "wal_level" >= "logical" > or presence of logical slot on the primary server. Will fix. > 4) > I see that the slotsync worker is running all the time now as against > the previous behaviour where it will not start if wal_level is less > than logical or switched to '< logical' anytime. Even with wal_level > and effective_wal_level set to replica, slot-sync keeps on attempting > synchronization. This does not look correct. I think we need to find a > way to stop sot-sync worker when effective_wal_level is switched to > replica from logical. Right, will fix. > 5) > Can you please help me understand the changes at [1]. > > a) Why is it needed when we have code logic at [2] This is because we use XLOG_LOGICAL_DECODING_STATUS_CHANGE record only for changing the logical decoding status online (i.e., without restarting the server). So I think we still these part of code in cases where we enable/disable the logical decoding by changing the wal_level value with restarting the server Suppose that both the primary and the standby set wal_level='replica', the logical decoding is not available on both sides. If the primary restarts with wal_level='logical', it doesn't write an XLOG_LOGICAL_DECODING_STATUS_CHANGE record. Another case is that suppose that the primary sets wal_level='logical' and the standby sets wal_level='replica', the logical decoding is available on both sides. If the primary restarts with wal_level='replica' we need to somehow tell the standby the fact that the logical decoding gets disabled. (BTW I realized we need to invalidate the logical slots in this case too). > b) in [1], why do we check n_inuse_logical_slots on standby and then > make decisions? Why not to disable logical-decoding directly just like > [2] It seems the code is incorrect. We should disable the logical decoding anyway if the primary disables it. Will fix. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: