Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Дата
Msg-id CAD21AoC6zSkzpdmmeCUn1+YmcAw4pDc21OehtpDp0Rd=cF2TYw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Список pgsql-hackers
On Mon, Jun 16, 2025 at 11:48 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Jun 11, 2025 at 2:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I think it's the user's responsibility to keep at least one logical
> > slot. It seems that setting wal_level to 'logical' would be the most
> > reliable solution for this case. We might want to provide a way to
> > keep 'logical' WAL level somehow but I don't have a good idea for now.
> >
>
> Okay,  Let me think  more on this.
>
> >
> > Considering cascading replication cases too, 2) could be tricky as
> > cascaded standbys need to propagate the information of logical slot
> > creation up to the most upstream server.
> >
>
> Yes, I understand the challenges here.
>
> Thanks for the v2 patches, few concerns:

Thank you for the comments!

> 1)
> Now when the slot on standby is invalidated due to effective_wal_level
> switched back to replica and if we restart standby, it fails to
> restart even if wal_level is explicitly changed to logical in conf
> file.
>
> FATAL:  logical replication slot "slot_st" exists, but logical
> decoding is not enabled
> HINT:  Change "wal_level" to be "replica" or higher.

Good catch, we should fix it.
>
> 2)
> I see that when primary switches back its effective wal_level to
> replica while standby has wal_level=logical in conf file, then standby
> has this status:
>
> postgres=# show wal_level;
>  wal_level
> -----------
>  logical
>
> postgres=# show effective_wal_level;
>  effective_wal_level
> ---------------------
>  replica
>
> Is this correct? Can effective_wal_level be < wal_level anytime? I
> feel it can be greater but never lesser.

Hmm, I think we need to define what value we should show in
effective_wal_level on standbys because the standbys actually are not
writing any WALs and whether or not the logical decoding is enabled on
the standbys depends on the primary.

In the previous version patch, the standby's effective_wal_level value
depended solely on the standby's wal_level value. However, it was
confusing in a sense because it's possible that the logical decoding
could be available even though effective_wal_level is 'replica' if the
primary already enables it. One idea is that given that the logical
decoding availability and effective_wal_level value are independent in
principle, it's better to provide a SQL function to get the logical
decoding status so that users can check the logical decoding
availability without checking effective_wal_level. With that function,
it might make sense to revert back the behavior to the previous one.
That is, on the primary the effective_wal_level value is always
greater than or equal to wal_level whereas on the standbys it's always
the same as wal_level, and users would be able to check the logical
decoding availability using the SQL function. Or it might also be
worth considering to show effective_wal_level as NULL on standbys.

>
> 3)
> When standby invalidate obsolete slots due to effective_wal_level on
> primary changed to replica, it dumps below:
> LOG:  invalidating obsolete replication slot "slot_st2"
> DETAIL:  Logical decoding on standby requires "wal_level" >= "logical"
> on the primary server
>
> Shall we update this message as well to convey about slot-presence on primary.
> DETAIL:  Logical decoding on standby requires "wal_level" >= "logical"
> or presence of logical slot on the primary server.

Will fix.

> 4)
> I see that the slotsync worker is running all the time now as against
> the previous behaviour where it will not start if wal_level is less
> than logical or switched to '< logical' anytime. Even with wal_level
> and effective_wal_level set to replica, slot-sync keeps on attempting
> synchronization. This does not look correct. I think we need to find a
> way to stop sot-sync worker when effective_wal_level is switched to
> replica from logical.

Right, will fix.

> 5)
> Can you please help me understand the changes at [1].
>
> a) Why is it needed when we have code logic at [2]

This is because we use XLOG_LOGICAL_DECODING_STATUS_CHANGE record only
for changing the logical decoding status online (i.e., without
restarting the server). So I think we still these part of code in
cases where we enable/disable the logical decoding by changing the
wal_level value with restarting the server

Suppose that both the primary and the standby set wal_level='replica',
the logical decoding is not available on both sides. If the primary
restarts with wal_level='logical', it doesn't write an
XLOG_LOGICAL_DECODING_STATUS_CHANGE record.

Another case is that suppose that the primary sets wal_level='logical'
and the standby sets wal_level='replica', the logical decoding is
available on both sides. If the primary restarts with
wal_level='replica' we need to somehow tell the standby the fact that
the logical decoding gets disabled. (BTW I realized we need to
invalidate the logical slots in this case too).

> b) in [1], why do we check n_inuse_logical_slots on standby and then
> make decisions? Why not to disable logical-decoding directly just like
> [2]

It seems the code is incorrect. We should disable the logical decoding
anyway if the primary disables it. Will fix.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления: