Re: Switching XLog source from archive to streaming when primary available

Поиск
Список
Период
Сортировка
От Nathan Bossart
Тема Re: Switching XLog source from archive to streaming when primary available
Дата
Msg-id 20240305020452.GA3373526@nathanxps13
обсуждение исходный текст
Ответ на Re: Switching XLog source from archive to streaming when primary available  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: Switching XLog source from archive to streaming when primary available  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
cfbot claims that this one needs another rebase.

I've spent some time thinking about this one.  I'll admit I'm a bit worried
about adding more complexity to this state machine, but I also haven't
thought of any other viable approaches, and this still seems like a useful
feature.  So, for now, I think we should continue with the current
approach.

+        fails to switch to stream mode, it falls back to archive mode. If this
+        parameter value is specified without units, it is taken as
+        milliseconds. Default is <literal>5min</literal>. With a lower value

Does this really need to be milliseconds?  I would think that any
reasonable setting would at least on the order of seconds.

+        attempts. To avoid this, it is recommended to set a reasonable value.

I think we might want to suggest what a "reasonable value" is.

+    static bool canSwitchSource = false;
+    bool        switchSource = false;

IIUC "canSwitchSource" indicates that we are trying to force a switch to
streaming, but we are currently exhausting anything that's present in the
pg_wal directory, while "switchSource" indicates that we should force a
switch to streaming right now.  Furthermore, "canSwitchSource" is static
while "switchSource" is not.  Is there any way to simplify this?  For
example, would it be possible to make an enum that tracks the
streaming_replication_retry_interval state?

             /*
              * Don't allow any retry loops to occur during nonblocking
-             * readahead.  Let the caller process everything that has been
-             * decoded already first.
+             * readahead if we failed to read from the current source. Let the
+             * caller process everything that has been decoded already first.
              */
-            if (nonblocking)
+            if (nonblocking && lastSourceFailed)
                 return XLREAD_WOULDBLOCK;

Why do we skip this when "switchSource" is set?

+            /* Reset the WAL source switch state */
+            if (switchSource)
+            {
+                Assert(canSwitchSource);
+                Assert(currentSource == XLOG_FROM_STREAM);
+                Assert(oldSource == XLOG_FROM_ARCHIVE);
+                switchSource = false;
+                canSwitchSource = false;
+            }

How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE?  Is
there no way it could be XLOG_FROM_PG_WAL?

+#streaming_replication_retry_interval = 5min    # time after which standby
+                    # attempts to switch WAL source from archive to
+                    # streaming replication
+                    # in milliseconds; 0 disables

I think we might want to turn this feature off by default, at least for the
first release.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Some shared memory chunks are allocated even if related processes won't start
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: vacuumdb/clusterdb/reindexdb: allow specifying objects to process in all databases