Обсуждение: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск

Список

Период

Сортировка

[PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

04 апреля 2023 г., 10:00:01

Dear hackers,
(CC: Amit and Julien)

This is a fork thread of Julien's thread, which allows to upgrade subscribers
without losing changes [1].

I briefly implemented a prototype for allowing to upgrade publisher node.
IIUC the key lack was that replication slots used for logical replication could
not be copied to new node by pg_upgrade command, so this patch allows that.
This feature can be used when '--include-replication-slot' is specified. Also,
I added a small test for the typical case. It may be helpful to understand.

Pg_upgrade internally executes pg_dump for dumping a database object from the old.
This feature follows this, adds a new option '--slot-only' to pg_dump command.
When specified, it extracts needed info from old node and generate an SQL file
that executes pg_create_logical_replication_slot().

The notable deference from pre-existing is that restoring slots are done at the
different time. Currently pg_upgrade works with following steps:

...
1. dump schema from old nodes
2. do pg_resetwal several times to new node
3. restore schema to new node
4. do pg_resetwal again to new node
...

The probem is that if we create replication slots at step 3, the restart_lsn and
confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
pg_resetwal discards the WAL file. Such slots cannot extracting changes.
To handle the issue the resotring is seprarated into two phases. At the first phase
restoring is done at step 3, excepts replicatin slots. At the second phase
replication slots are restored at step 5, after doing pg_resetwal.

Before upgrading a publisher node, all the changes gerenated on publisher must
be sent and applied on subscirber. This is because restart_lsn and confirmed_flush_lsn
of copied replication slots is same as current_wal_insert_lsn. New node resets
the information which WALs are really applied on subscriber and restart.
Basically it is not problematic because before shutting donw the publisher, its
walsender processes confirm all data is replicated. See WalSndDone() and related code.

Currently physical slots are ignored because this is out-of-scope for me.
I did not any analysis about it.

[1]: https://www.postgresql.org/message-id/flat/20230217075433.u5mjly4d5cr4hcfe%40jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

0001-pg_upgrade-Add-include-replication-slot-option.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

06 апреля 2023 г., 11:23:33

Hi Kuroda-san.

This is a WIP review. I'm yet to do more testing and more study of the
POC patch's design.

While reading the code I kept a local list of my review comments.
Meanwhile, there is a long weekend coming up here, so I thought it
would be better to pass these to you now rather than next week in case
you want to address them.

======
General

1.
Since these two new options are made to work together, I think the
names should be more similar. e.g.

pg_dump: "--slot_only" --> "--replication-slots-only"
pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"

help/comments/commit-message all should change accordingly, but I did
not give separate review comments for each of these.

~~~

2.
I felt there maybe should be some pg_dump test cases for that new
option, rather than the current patch where it only seems to be
testing the new pg_dump option via the pg_upgrade TAP tests.

======
Commit message

3.
This commit introduces a new option called "--include-replication-slot".
This allows nodes with logical replication slots to be upgraded. The commit can
be divided into two parts: one for pg_dump and another for pg_upgrade.

~

"new option" --> "new pg_upgrade" option

~~~

4.
For pg_upgrade, when '--include-replication-slot' is specified, it
executes pg_dump
with added option and restore from the dump. Apart from restoring
schema, pg_resetwal
must not be called after restoring replicaiton slots. This is because
the command
discards WAL files and starts from a new segment, even if they are required by
replication slots. This leads an ERROR: "requested WAL segment XXX has already
been removed". To avoid this, replication slots are restored at a different time
than other objects, after running pg_resetwal.

~

4a.
"with added option and restore from the dump" --> "with the new
"--slot-only" option and restores from the dump"

~

4b.
Typo: /replicaiton/replication/

~

4c
"leads an ERROR" --> "leads to an ERROR"

======

doc/src/sgml/ref/pg_dump.sgml

5.
+     <varlistentry>
+      <term><option>--slot-only</option></term>
+      <listitem>
+       <para>
+        Dump only replication slots, neither the schema (data definitions) nor
+        data. Mainly this is used for upgrading nodes.
+       </para>
+      </listitem>

SUGGESTION
Dump only replication slots; not the schema (data definitions), nor
data. This is mainly used when upgrading nodes.

======

doc/src/sgml/ref/pgupgrade.sgml

6.
+       <para>
+        Transport replication slots. Currently this can work only for logical
+        slots, and temporary slots are ignored. Note that pg_upgrade does not
+        check the installation of plugins.
+       </para>

SUGGESTION
Upgrade replication slots. Only logical replication slots are
currently supported, and temporary slots are ignored. Note that...

======

src/bin/pg_dump/pg_dump.c

7. main
  {"exclude-table-data-and-children", required_argument, NULL, 14},
-
+ {"slot-only", no_argument, NULL, 15},
  {NULL, 0, NULL, 0}

The blank line is misplaced.

~~~

8. main
+ case 15: /* dump onlu replication slot(s) */
+ dopt.slot_only = true;
+ dopt.include_everything = false;
+ break;

typo: /onlu/only/

~~~

9. main
+ if (dopt.slot_only && dopt.dataOnly)
+ pg_fatal("options --replicatin-slots and -a/--data-only cannot be
used together");
+ if (dopt.slot_only && dopt.schemaOnly)
+ pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
used together");
+

9a.
typo: /replicatin/replication/

~

9b.
I am wondering if these checks are enough. E.g. is "slots-only"
compatible with "no-publications" ?

~~~

10. main
+ /*
+ * If dumping replication slots are request, dumping them and skip others.
+ */
+ if (dopt.slot_only)
+ {
+ getRepliactionSlots(fout);
+ goto dump;
+ }

10a.
SUGGESTION
If dump replication-slots-only was requested, dump only them and skip
everything else.

~

10b.
This code seems mutually exclusive to every other option. I'm
wondering if this code even needs 'collectRoleNames', or should the
slots option check be moved  above that (and also above the 'Dumping
LOs' etc...)

~~~

11. help

+ printf(_("  --slot-only                  dump only replication
slots, no schema and data\n"));

11a.
SUGGESTION
"no schema and data" --> "no schema or data"

~

11b.
This help is misplaced. It should be in alphabetical order consistent
with all the other help.

~~~
12. getRepliactionSlots

+/*
+ * getRepliactionSlots
+ *   get information about replication slots
+ */
+static void
+getRepliactionSlots(Archive *fout)

Function name typo / getRepliactionSlots/ getReplicationSlots/
(also in the comment)

~~~

13. getRepliactionSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 && !dopt->slot_only)
+ return;

Hmmm, is that condition correct? Shouldn't the && be || here?

~~~

14. dumpReplicationSlot

+static void
+dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+ PQExpBuffer query;
+ char *slotname;
+
+ if (!dopt->slot_only)
+ return;
+
+ slotname = pg_strdup(slotinfo->dobj.name);
+ query = createPQExpBuffer();
+
+ /*
+ * XXX: For simplification, pg_create_logical_replication_slot() is used.
+ * Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
+   slotname);
+ appendStringLiteralAH(query, slotinfo->plugin, fout);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralAH(query, slotinfo->twophase, fout);
+ appendPQExpBuffer(query, ");");
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+ ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
+ ARCHIVE_OPTS(.tag = slotname,
+   .description = "REPICATION SLOT",
+   .section = SECTION_POST_DATA,
+   .createStmt = query->data));
+
+ /* XXX: do we have to dump security label? */
+
+ if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
+ dumpComment(fout, "REPICATION SLOT", slotname,
+ NULL, NULL,
+ slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
+
+ pfree(slotname);
+ destroyPQExpBuffer(query);
+}

14a.
Wouldn't it be better to check the "slotinfo->dobj.dump &
DUMP_COMPONENT_DEFINITION" condition first, before building the query?
For example, see other function dumpIndexAttach().

~

14b.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS description.

~

14c.
Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.

======

src/bin/pg_dump/pg_dump.h

15. DumpableObjectType

@@ -82,7 +82,8 @@ typedef enum
  DO_PUBLICATION,
  DO_PUBLICATION_REL,
  DO_PUBLICATION_TABLE_IN_SCHEMA,
- DO_SUBSCRIPTION
+ DO_SUBSCRIPTION,
+ DO_REPICATION_SLOT
 } DumpableObjectType;

Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/

======

src/bin/pg_upgrade/dump.c

16. generate_old_dump

+ /*
+ * Dump replicaiton slots if needed.
+ *
+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring replication
+ * slots and other objects. Replication slots, in particular, should
+ * not be restored before executing the pg_resetwal command because it
+ * will remove WALs that are required by the slots.
+ */

Typo: /replicaiton/replication/

======

src/bin/pg_upgrade/pg_upgrade.c

17. main

+ /*
+ * Create replication slots if requested.
+ *
+ * XXX This must be done after doing pg_resetwal command because the
+ * command will remove required WALs.
+ */
+ if (user_opts.include_slots)
+ {
+ start_postmaster(&new_cluster, true);
+ create_replicaiton_slots();
+ stop_postmaster(false);
+ }
+

I don't think that warrants a "XXX" style comment. It is just a "Note:".

~~~

18. create_replicaiton_slots
+
+/*
+ * create_replicaiton_slots()
+ *
+ * Similar to create_new_objects() but only restores replication slots.
+ */
+static void
+create_replicaiton_slots(void)

Typo: /create_replicaiton_slots/create_replication_slots/

(Function name and comment)

~~~

19. create_replicaiton_slots

+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ char slots_file_name[MAXPGPATH],
+ log_file_name[MAXPGPATH];
+ DbInfo    *old_db = &old_cluster.dbarr.dbs[dbnum];
+ char    *opts;
+
+ pg_log(PG_STATUS, "%s", old_db->db_name);
+
+ snprintf(slots_file_name, sizeof(slots_file_name),
+ DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
+ snprintf(log_file_name, sizeof(log_file_name),
+ DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
+
+ opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
+
+ parallel_exec_prog(log_file_name,
+    NULL,
+    "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
+    new_cluster.bindir,
+    cluster_conn_opts(&new_cluster),
+    opts,
+    old_db->db_name,
+    log_opts.dumpdir,
+    slots_file_name);
+ }

That 'opts' variable seems unnecessary. Why not just pass the string
literal directly when invoking parallel_exec_prog()?

Or if not removed, then at make it const char psql_opts =
"--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";

======

src/bin/pg_upgrade/pg_upgrade.h

20.
+#define DB_DUMP_FILE_MASK_FOR_SLOTS "pg_upgrade_dump_%u_slots.custom"

20a.
For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
probably this should be called DB_DUMP_SLOTS_FILE_MASK.

~

20b.
Because the content of this dump/restore file is SQL (not custom
binary) wouldn't a filename suffix ".sql" be better?

======

.../pg_upgrade/t/003_logical_replication.pl

21.
Some parts (formatting, comments, etc) in this file are inconsistent.

21a
");" is sometimes alone on a line, sometimes not

~

21b.
"Init" versus "Create" nodes.

~

21c.
# Check whether changes on new publisher are shipped to subscriber

SUGGESTION
Check whether changes on the new publisher get replicated to the subscriber
~

21d.
$result =
  $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
is($result, qq(20),
    'check changes are shipped to subscriber');

For symmetry with before/after, I think it would be better to do this
same command before the upgrade to confirm q(10) rows.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Julien Rouhaud

Дата:

07 апреля 2023 г., 05:48:23

Hi,

On Tue, Apr 04, 2023 at 07:00:01AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Dear hackers,
> (CC: Amit and Julien)

(thanks for the Cc)

> This is a fork thread of Julien's thread, which allows to upgrade subscribers
> without losing changes [1].
>
> I briefly implemented a prototype for allowing to upgrade publisher node.
> IIUC the key lack was that replication slots used for logical replication could
> not be copied to new node by pg_upgrade command, so this patch allows that.
> This feature can be used when '--include-replication-slot' is specified. Also,
> I added a small test for the typical case. It may be helpful to understand.
>
> Pg_upgrade internally executes pg_dump for dumping a database object from the old.
> This feature follows this, adds a new option '--slot-only' to pg_dump command.
> When specified, it extracts needed info from old node and generate an SQL file
> that executes pg_create_logical_replication_slot().
>
> The notable deference from pre-existing is that restoring slots are done at the
> different time. Currently pg_upgrade works with following steps:
>
> ...
> 1. dump schema from old nodes
> 2. do pg_resetwal several times to new node
> 3. restore schema to new node
> 4. do pg_resetwal again to new node
> ...
>
> The probem is that if we create replication slots at step 3, the restart_lsn and
> confirmed_flush_lsn are set to current_wal_insert_lsn at that time, whereas
> pg_resetwal discards the WAL file. Such slots cannot extracting changes.
> To handle the issue the resotring is seprarated into two phases. At the first phase
> restoring is done at step 3, excepts replicatin slots. At the second phase
> replication slots are restored at step 5, after doing pg_resetwal.
>
> Before upgrading a publisher node, all the changes gerenated on publisher must
> be sent and applied on subscirber. This is because restart_lsn and confirmed_flush_lsn
> of copied replication slots is same as current_wal_insert_lsn. New node resets
> the information which WALs are really applied on subscriber and restart.
> Basically it is not problematic because before shutting donw the publisher, its
> walsender processes confirm all data is replicated. See WalSndDone() and related code.

As I mentioned in my original thread, I'm not very familiar with that code, but
I'm a bit worried about "all the changes generated on publisher must be send
and applied".  Is that a hard requirement for the feature to work reliably?  If
yes, how does this work if some subscriber node isn't connected when the
publisher node is stopped?  I guess you could add a check in pg_upgrade to make
sure that all logical slot are indeed caught up and fail if that's not the case
rather than assuming that a clean shutdown implies it.  It would be good to
cover that in the TAP test, and also cover some corner cases, like any new row
added on the publisher node after the pg_upgrade but before the subscriber is
reconnected is also replicated as expected.
>
> Currently physical slots are ignored because this is out-of-scope for me.
> I did not any analysis about it.

Agreed, but then shouldn't the option be named "--logical-slots-only" or
something like that, same for all internal function names?

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

07 апреля 2023 г., 12:40:14

Dear Julien,

Thank you for giving comments!

> As I mentioned in my original thread, I'm not very familiar with that code, but
> I'm a bit worried about "all the changes generated on publisher must be send
> and applied".  Is that a hard requirement for the feature to work reliably?

I think the requirement is needed because the existing WALs on old node cannot be
transported on new instance. The WAL hole from confirmed_flush to current position
could not be filled by newer instance.

> If
> yes, how does this work if some subscriber node isn't connected when the
> publisher node is stopped?  I guess you could add a check in pg_upgrade to make
> sure that all logical slot are indeed caught up and fail if that's not the case
> rather than assuming that a clean shutdown implies it.  It would be good to
> cover that in the TAP test, and also cover some corner cases, like any new row
> added on the publisher node after the pg_upgrade but before the subscriber is
> reconnected is also replicated as expected.

Hmm, good point. Current patch could not be handled the case because walsenders
for the such slots do not exist. I have tested your approach, however, I found that
CHECKPOINT_SHUTDOWN record were generated twice when publisher was
shutted down and started. It led that the confirmed_lsn of slots always was behind
from WAL insert location and failed to upgrade every time.
Now I do not have good idea to solve it... Do anyone have for this?

> Agreed, but then shouldn't the option be named "--logical-slots-only" or
> something like that, same for all internal function names?

Seems right. Will be fixed in next version. Maybe "--logical-replication-slots-only"
will be used, per Peter's suggestion [1].

[1]: https://www.postgresql.org/message-id/CAHut%2BPvpBsyxj9SrB1ZZ9gP7r1AA5QoTYjpzMcVSjQO2xQy7aw%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

07 апреля 2023 г., 15:51:51

Dear Julien,

> > Agreed, but then shouldn't the option be named "--logical-slots-only" or
> > something like that, same for all internal function names?
>
> Seems right. Will be fixed in next version. Maybe
> "--logical-replication-slots-only"
> will be used, per Peter's suggestion [1].

After considering more, I decided not to include the word "logical" in the option
at this point. This is because we have not decided yet whether we dumps physical
replication slots or not. Current restriction has been occurred because of just
lack of analysis and considerations, If we decide not to do that, then they will
be renamed accordingly.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

07 апреля 2023 г., 16:59:58

Dear Peter,

Thank you for reviewing briefly. PSA new version.
If you can I want to ask the opinion about the checking by pg_upgrade [1].

> ======
> General
> 
> 1.
> Since these two new options are made to work together, I think the
> names should be more similar. e.g.
> 
> pg_dump: "--slot_only" --> "--replication-slots-only"
> pg_upgrade: "--include-replication-slot" --> "--include-replication-slots"
> 
> help/comments/commit-message all should change accordingly, but I did
> not give separate review comments for each of these.

OK, I renamed. By the way, how do you think the suggestion raised by Julien?
Currently I did not address it because the restriction was caused by just lack of
analysis, and this may be not agreed in the community.
Or, should we keep the name anyway?

> 2.
> I felt there maybe should be some pg_dump test cases for that new
> option, rather than the current patch where it only seems to be
> testing the new pg_dump option via the pg_upgrade TAP tests.

Hmm, I supposed that the option shoul be used only for upgrading, so I'm not sure
it must be tested by only pg_dump.

> Commit message
> 
> 3.
> This commit introduces a new option called "--include-replication-slot".
> This allows nodes with logical replication slots to be upgraded. The commit can
> be divided into two parts: one for pg_dump and another for pg_upgrade.
> 
> ~
> 
> "new option" --> "new pg_upgrade" option

Fixed.

> 4.
> For pg_upgrade, when '--include-replication-slot' is specified, it
> executes pg_dump
> with added option and restore from the dump. Apart from restoring
> schema, pg_resetwal
> must not be called after restoring replicaiton slots. This is because
> the command
> discards WAL files and starts from a new segment, even if they are required by
> replication slots. This leads an ERROR: "requested WAL segment XXX has already
> been removed". To avoid this, replication slots are restored at a different time
> than other objects, after running pg_resetwal.
> 
> ~
> 
> 4a.
> "with added option and restore from the dump" --> "with the new
> "--slot-only" option and restores from the dump"

Fixed.

> 4b.
> Typo: /replicaiton/replication/

Fixed.

> 4c
> "leads an ERROR" --> "leads to an ERROR"

Fixed.

> doc/src/sgml/ref/pg_dump.sgml
> 
> 5.
> +     <varlistentry>
> +      <term><option>--slot-only</option></term>
> +      <listitem>
> +       <para>
> +        Dump only replication slots, neither the schema (data definitions) nor
> +        data. Mainly this is used for upgrading nodes.
> +       </para>
> +      </listitem>
> 
> SUGGESTION
> Dump only replication slots; not the schema (data definitions), nor
> data. This is mainly used when upgrading nodes.

Fixed.

> doc/src/sgml/ref/pgupgrade.sgml
> 
> 6.
> +       <para>
> +        Transport replication slots. Currently this can work only for logical
> +        slots, and temporary slots are ignored. Note that pg_upgrade does not
> +        check the installation of plugins.
> +       </para>
> 
> SUGGESTION
> Upgrade replication slots. Only logical replication slots are
> currently supported, and temporary slots are ignored. Note that...

Fixed.

> src/bin/pg_dump/pg_dump.c
> 
> 7. main
>   {"exclude-table-data-and-children", required_argument, NULL, 14},
> -
> + {"slot-only", no_argument, NULL, 15},
>   {NULL, 0, NULL, 0}
> 
> The blank line is misplaced.

Fixed.

> 8. main
> + case 15: /* dump onlu replication slot(s) */
> + dopt.slot_only = true;
> + dopt.include_everything = false;
> + break;
> 
> typo: /onlu/only/

Fixed.

> 9. main
> + if (dopt.slot_only && dopt.dataOnly)
> + pg_fatal("options --replicatin-slots and -a/--data-only cannot be
> used together");
> + if (dopt.slot_only && dopt.schemaOnly)
> + pg_fatal("options --replicatin-slots and -s/--schema-only cannot be
> used together");
> +
> 
> 9a.
> typo: /replicatin/replication/

Fixed. Additionally, wrong parameter reference was also fixed.

> 9b.
> I am wondering if these checks are enough. E.g. is "slots-only"
> compatible with "no-publications" ?

I think there are something what should be checked more. But I'm not sure about
"no-publication". There is a possibility that non-core logical replication is used,
and at that time these options are not contradicted.

> 10. main
> + /*
> + * If dumping replication slots are request, dumping them and skip others.
> + */
> + if (dopt.slot_only)
> + {
> + getRepliactionSlots(fout);
> + goto dump;
> + }
> 
> 10a.
> SUGGESTION
> If dump replication-slots-only was requested, dump only them and skip
> everything else.

Fixed.

> 10b.
> This code seems mutually exclusive to every other option. I'm
> wondering if this code even needs 'collectRoleNames', or should the
> slots option check be moved  above that (and also above the 'Dumping
> LOs' etc...)

I read again, and I found that collected username are used to check the owner of
objects. IIUC replicaiton slots are not owned by database users, so it is not
needed. Also, the LOs should not dumped here. Based on them, I moved getRepliactionSlots()
above them.

> 11. help
> 
> + printf(_("  --slot-only                  dump only replication
> slots, no schema and data\n"));
> 
> 11a.
> SUGGESTION
> "no schema and data" --> "no schema or data"

Fixed.

> 11b.
> This help is misplaced. It should be in alphabetical order consistent
> with all the other help.
> 
> ~~~
> 12. getRepliactionSlots
> 
> +/*
> + * getRepliactionSlots
> + *   get information about replication slots
> + */
> +static void
> +getRepliactionSlots(Archive *fout)
> 
> Function name typo / getRepliactionSlots/ getReplicationSlots/
> (also in the comment)

Fixed.

> 13. getRepliactionSlots
> 
> + /* Check whether we should dump or not */
> + if (fout->remoteVersion < 160000 && !dopt->slot_only)
> + return;
> 
> Hmmm, is that condition correct? Shouldn't the && be || here?

Right, fixed.

> 14. dumpReplicationSlot
> 
> +static void
> +dumpReplicationSlot(Archive *fout, const ReplicationSlotInfo *slotinfo)
> +{
> + DumpOptions *dopt = fout->dopt;
> + PQExpBuffer query;
> + char *slotname;
> +
> + if (!dopt->slot_only)
> + return;
> +
> + slotname = pg_strdup(slotinfo->dobj.name);
> + query = createPQExpBuffer();
> +
> + /*
> + * XXX: For simplification, pg_create_logical_replication_slot() is used.
> + * Is it sufficient?
> + */
> + appendPQExpBuffer(query, "SELECT pg_create_logical_replication_slot('%s', ",
> +   slotname);
> + appendStringLiteralAH(query, slotinfo->plugin, fout);
> + appendPQExpBuffer(query, ", ");
> + appendStringLiteralAH(query, slotinfo->twophase, fout);
> + appendPQExpBuffer(query, ");");
> +
> + if (slotinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
> + ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
> + ARCHIVE_OPTS(.tag = slotname,
> +   .description = "REPICATION SLOT",
> +   .section = SECTION_POST_DATA,
> +   .createStmt = query->data));
> +
> + /* XXX: do we have to dump security label? */
> +
> + if (slotinfo->dobj.dump & DUMP_COMPONENT_COMMENT)
> + dumpComment(fout, "REPICATION SLOT", slotname,
> + NULL, NULL,
> + slotinfo->dobj.catId, 0, slotinfo->dobj.dumpId);
> +
> + pfree(slotname);
> + destroyPQExpBuffer(query);
> +}
> 
> 14a.
> Wouldn't it be better to check the "slotinfo->dobj.dump &
> DUMP_COMPONENT_DEFINITION" condition first, before building the query?
> For example, see other function dumpIndexAttach().

The style was chosen because previously I referred dumpSubscription(). But I read
PG manual and understood that COMMENT and SECURITY LABEL cannot be set to replication
slots. Therefore, I removed comments and dump for DUMP_COMPONENT_COMMENT, then
followed the style.

> 14b.
> Typo: /REPICATION SLOT/REPLICATION SLOT/ in the ARCHIVE_OPTS
> description.
> 
> ~
> 
> 14c.
> Typo: /REPICATION SLOT/REPLICATION SLOT/ in the dumpComment parameter.

Both of them were fixed.

> src/bin/pg_dump/pg_dump.h
> 
> 15. DumpableObjectType
> 
> @@ -82,7 +82,8 @@ typedef enum
>   DO_PUBLICATION,
>   DO_PUBLICATION_REL,
>   DO_PUBLICATION_TABLE_IN_SCHEMA,
> - DO_SUBSCRIPTION
> + DO_SUBSCRIPTION,
> + DO_REPICATION_SLOT
>  } DumpableObjectType;
> 
> Typo /DO_REPICATION_SLOT/DO_REPLICATION_SLOT/

Fixed.

> src/bin/pg_upgrade/dump.c
> 
> 16. generate_old_dump
> 
> + /*
> + * Dump replicaiton slots if needed.
> + *
> + * XXX We cannot dump replication slots at the same time as the schema
> + * dump because we need to separate the timing of restoring replication
> + * slots and other objects. Replication slots, in particular, should
> + * not be restored before executing the pg_resetwal command because it
> + * will remove WALs that are required by the slots.
> + */
> 
> Typo: /replicaiton/replication/

Fixed.

> src/bin/pg_upgrade/pg_upgrade.c
> 
> 17. main
> 
> + /*
> + * Create replication slots if requested.
> + *
> + * XXX This must be done after doing pg_resetwal command because the
> + * command will remove required WALs.
> + */
> + if (user_opts.include_slots)
> + {
> + start_postmaster(&new_cluster, true);
> + create_replicaiton_slots();
> + stop_postmaster(false);
> + }
> +
> 
> I don't think that warrants a "XXX" style comment. It is just a "Note:".

Fixed. Could you please tell me the classification of them if you can?

> 18. create_replicaiton_slots
> +
> +/*
> + * create_replicaiton_slots()
> + *
> + * Similar to create_new_objects() but only restores replication slots.
> + */
> +static void
> +create_replicaiton_slots(void)
> 
> Typo: /create_replicaiton_slots/create_replication_slots/
> 
> (Function name and comment)

All of them were replaced.

> 19. create_replicaiton_slots
> 
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + {
> + char slots_file_name[MAXPGPATH],
> + log_file_name[MAXPGPATH];
> + DbInfo    *old_db = &old_cluster.dbarr.dbs[dbnum];
> + char    *opts;
> +
> + pg_log(PG_STATUS, "%s", old_db->db_name);
> +
> + snprintf(slots_file_name, sizeof(slots_file_name),
> + DB_DUMP_FILE_MASK_FOR_SLOTS, old_db->db_oid);
> + snprintf(log_file_name, sizeof(log_file_name),
> + DB_DUMP_LOG_FILE_MASK, old_db->db_oid);
> +
> + opts = "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";
> +
> + parallel_exec_prog(log_file_name,
> +    NULL,
> +    "\"%s/psql\" %s %s --dbname %s -f \"%s/%s\"",
> +    new_cluster.bindir,
> +    cluster_conn_opts(&new_cluster),
> +    opts,
> +    old_db->db_name,
> +    log_opts.dumpdir,
> +    slots_file_name);
> + }
> 
> That 'opts' variable seems unnecessary. Why not just pass the string
> literal directly when invoking parallel_exec_prog()?
> 
> Or if not removed, then at make it const char psql_opts =
> "--echo-queries --set ON_ERROR_STOP=on --no-psqlrc";

I had tried to follow the prepare_new_globals() style, but
I preferred your suggestion. Fixed.

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 20.
> +#define DB_DUMP_FILE_MASK_FOR_SLOTS
> "pg_upgrade_dump_%u_slots.custom"
> 
> 20a.
> For consistency with other mask names (e.g. DB_DUMP_LOG_FILE_MASK)
> probably this should be called DB_DUMP_SLOTS_FILE_MASK.

Fixed.

> 20b.
> Because the content of this dump/restore file is SQL (not custom
> binary) wouldn't a filename suffix ".sql" be better?

Right, fixed.

> .../pg_upgrade/t/003_logical_replication.pl
> 
> 21.
> Some parts (formatting, comments, etc) in this file are inconsistent.
> 
> 21a
> ");" is sometimes alone on a line, sometimes not

I ran pgperltidy and lonely ");" is removed.

> 21b.
> "Init" versus "Create" nodes.

"Initialize" was chosen.

> 21c.
> # Check whether changes on new publisher are shipped to subscriber
> 
> SUGGESTION
> Check whether changes on the new publisher get replicated to the subscriber

Fixed.

> 21d.
> $result =
>   $subscriber->safe_psql('postgres', "SELECT count(*) FROM tbl");
> is($result, qq(20),
>     'check changes are shipped to subscriber');
> 
> For symmetry with before/after, I think it would be better to do this
> same command before the upgrade to confirm q(10) rows.

Added.

[1]: https://www.postgresql.org/message-id/20230407024823.3j2s4doslsjemvis%40jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v2-0001-pg_upgrade-Add-include-replication-slots-option.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Julien Rouhaud

Дата:

07 апреля 2023 г., 18:29:44

On Fri, Apr 07, 2023 at 09:40:14AM +0000, Hayato Kuroda (Fujitsu) wrote:
>
> > As I mentioned in my original thread, I'm not very familiar with that code, but
> > I'm a bit worried about "all the changes generated on publisher must be send
> > and applied".  Is that a hard requirement for the feature to work reliably?
>
> I think the requirement is needed because the existing WALs on old node cannot be
> transported on new instance. The WAL hole from confirmed_flush to current position
> could not be filled by newer instance.

I see, that was also the first blocker I could think of when Amit mentioned
that feature weeks ago and I also don't see how that whole could be filled
either.

> > If
> > yes, how does this work if some subscriber node isn't connected when the
> > publisher node is stopped?  I guess you could add a check in pg_upgrade to make
> > sure that all logical slot are indeed caught up and fail if that's not the case
> > rather than assuming that a clean shutdown implies it.  It would be good to
> > cover that in the TAP test, and also cover some corner cases, like any new row
> > added on the publisher node after the pg_upgrade but before the subscriber is
> > reconnected is also replicated as expected.
>
> Hmm, good point. Current patch could not be handled the case because walsenders
> for the such slots do not exist. I have tested your approach, however, I found that
> CHECKPOINT_SHUTDOWN record were generated twice when publisher was
> shutted down and started. It led that the confirmed_lsn of slots always was behind
> from WAL insert location and failed to upgrade every time.
> Now I do not have good idea to solve it... Do anyone have for this?

I'm wondering if we could just check that each slot's LSN is exactly
sizeof(CHECKPOINT_SHUTDOWN) ago or something like that?  That's hackish, but if
pg_upgrade can run it means it was a clean shutdown so it should be safe to
assume that what's the last record in the WAL was.  For the double
shutdown checkpoint, I'm not sure that I get the problem.  The check should
only be done at the very beginning of pg_upgrade, so there should have been
only one shutdown checkpoint done right?

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Julien Rouhaud

Дата:

07 апреля 2023 г., 18:39:02

On Fri, Apr 07, 2023 at 12:51:51PM +0000, Hayato Kuroda (Fujitsu) wrote:
> Dear Julien,
> 
> > > Agreed, but then shouldn't the option be named "--logical-slots-only" or
> > > something like that, same for all internal function names?
> > 
> > Seems right. Will be fixed in next version. Maybe
> > "--logical-replication-slots-only"
> > will be used, per Peter's suggestion [1].
> 
> After considering more, I decided not to include the word "logical" in the option
> at this point. This is because we have not decided yet whether we dumps physical
> replication slots or not. Current restriction has been occurred because of just
> lack of analysis and considerations, If we decide not to do that, then they will
> be renamed accordingly.

Well, even if physical replication slots were eventually preserved during
pg_upgrade, maybe users would like to only keep one kind of the others so
having both options could make sense.

That being said, I have a hard time believing that we could actually preserve
physical replication slots.  I don't think that pg_upgrade final state is fully
reproducible:  not all object oids are preserved, and the various pg_restore
are run in parallel so you're very likely to end up with small physical
differences that would be incompatible with physical replication.  Even if we
could make it totally reproducible, it would probably be at the cost of making
pg_upgrade orders of magnitude slower.  And since many people are already
complaining that it's too slow, that doesn't seem like something we would want.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

10 апреля 2023 г., 12:16:09

Dear Julien,

> Well, even if physical replication slots were eventually preserved during
> pg_upgrade, maybe users would like to only keep one kind of the others so
> having both options could make sense.

You meant to say that we can rename options like "logical-*" and later add a new
option for physical slots if needed, right? PSA the new patch which handled the comment.

> That being said, I have a hard time believing that we could actually preserve
> physical replication slots.  I don't think that pg_upgrade final state is fully
> reproducible:  not all object oids are preserved, and the various pg_restore
> are run in parallel so you're very likely to end up with small physical
> differences that would be incompatible with physical replication.  Even if we
> could make it totally reproducible, it would probably be at the cost of making
> pg_upgrade orders of magnitude slower.  And since many people are already
> complaining that it's too slow, that doesn't seem like something we would want.

Your point made sense to me. Thank you for giving your opinion.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v3-0001-pg_upgrade-Add-include-logical-replication-slots-.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

10 апреля 2023 г., 12:18:46

Dear Julien,

Thank you for giving idea! I have analyzed about it.

> > > If
> > > yes, how does this work if some subscriber node isn't connected when the
> > > publisher node is stopped?  I guess you could add a check in pg_upgrade to
> make
> > > sure that all logical slot are indeed caught up and fail if that's not the case
> > > rather than assuming that a clean shutdown implies it.  It would be good to
> > > cover that in the TAP test, and also cover some corner cases, like any new
> row
> > > added on the publisher node after the pg_upgrade but before the subscriber is
> > > reconnected is also replicated as expected.
> >
> > Hmm, good point. Current patch could not be handled the case because
> walsenders
> > for the such slots do not exist. I have tested your approach, however, I found that
> > CHECKPOINT_SHUTDOWN record were generated twice when publisher was
> > shutted down and started. It led that the confirmed_lsn of slots always was
> behind
> > from WAL insert location and failed to upgrade every time.
> > Now I do not have good idea to solve it... Do anyone have for this?
>
> I'm wondering if we could just check that each slot's LSN is exactly
> sizeof(CHECKPOINT_SHUTDOWN) ago or something like that?  That's hackish,
> but if
> pg_upgrade can run it means it was a clean shutdown so it should be safe to
> assume that what's the last record in the WAL was.  For the double
> shutdown checkpoint, I'm not sure that I get the problem.  The check should
> only be done at the very beginning of pg_upgrade, so there should have been
> only one shutdown checkpoint done right?

I have analyzed about the point but it seemed to be difficult. This is because
some additional records like followings may be inserted. PSA the script which is
used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
so I wanted to withdraw it once. Sorry for noise.

* HEAP/HEAP2 records. These records may be inserted by checkpointer.

IIUC, if there are tuples which have not been flushed yet when shutdown is requested,
the checkpointer writes back all of them into heap file. At that time many WAL
records are generated. I think we cannot predict the number of records beforehand.

* INVALIDATION(S) records. These records may be inserted by VACUUM.

There is a possibility that autovacuum runs and generate WAL records. I think we
cannot predict the number of records beforehand because it depends on the number
of objects.

* RUNNING_XACTS record

It might be a timing issue, but I found that sometimes background writer generated
a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it will be
generated when the process spends 15 seconds since last logging and there are
important records. I think it is difficult to predict whether this will be appeared or not.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

test.sh

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

11 апреля 2023 г., 10:54:03

Here are a few more review comments for patch v3-0001.

======
doc/src/sgml/ref/pgupgrade.sgml

1.
+     <varlistentry>
+      <term><option>--include-logical-replication-slots</option></term>
+      <listitem>
+       <para>
+        Upgrade logical replication slots. Only permanent replication slots
+        included. Note that pg_upgrade does not check the installation of
+        plugins.
+       </para>
+      </listitem>
+     </varlistentry>

Missing word.

"Only permanent replication slots included." --> "Only permanent
replication slots are included."

======
src/bin/pg_dump/pg_dump.c

2. help

@@ -1119,6 +1145,8 @@ help(const char *progname)
  printf(_("  --no-unlogged-table-data     do not dump unlogged table data\n"));
  printf(_("  --on-conflict-do-nothing     add ON CONFLICT DO NOTHING
to INSERT commands\n"));
  printf(_("  --quote-all-identifiers      quote all identifiers, even
if not key words\n"));
+ printf(_("  --logical-replication-slots-only\n"
+ "                               dump only logical replication slots,
no schema or data\n"));
  printf(_("  --rows-per-insert=NROWS      number of rows per INSERT;
implies --inserts\n"));
A previous review comment ([1] #11b) seems to have been missed. This
help is misplaced. It should be in alphabetical order consistent with
all the other help.

======
src/bin/pg_dump/pg_dump.h

3. _LogicalReplicationSlotInfo

+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */
+typedef struct _LogicalReplicationSlotInfo
+{
+ DumpableObject dobj;
+ char    *plugin;
+ char    *slottype;
+ char    *twophase;
+} LogicalReplicationSlotInfo;
+

4a.
The indent of the 'LogicalReplicationSlotInfo' looks a bit strange,
unlike others in this file. Is it OK?

~

4b.
There was no typedefs.list file in this patch. Maybe the above
whitespace problem is a result of that omission.

======
.../pg_upgrade/t/003_logical_replication.pl

5.

+# Run pg_upgrade. pg_upgrade_output.d is removed at the end
+command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d',         $old_publisher->data_dir,
+ '-D',         $new_publisher->data_dir,
+ '-b',         $bindir,
+ '-B',         $bindir,
+ '-s',         $new_publisher->host,
+ '-p',         $old_publisher->port,
+ '-P',         $new_publisher->port,
+ $mode,        '--include-logical-replication-slot'
+ ],
+ 'run of pg_upgrade for new publisher');

5a.
How can this test even be working as-expected with those options?

Here it is passing option '--include-logical-replication-slot' but
AFAIK the proper option name everywhere else in this patch is
'--include-logical-replication-slots' (with the 's')

~

5b.
I'm not sure that "pg_upgrade for new publisher" makes sense.

It's more like "pg_upgrade of old publisher", or simply "pg_upgrade of
publisher"

------
[1]
https://www.postgresql.org/message-id/TYCPR01MB5870E212F5012FD6272CE1E3F5969%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

11 апреля 2023 г., 11:20:43

On Sat, Apr 8, 2023 at 12:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
...
> > 17. main
> >
> > + /*
> > + * Create replication slots if requested.
> > + *
> > + * XXX This must be done after doing pg_resetwal command because the
> > + * command will remove required WALs.
> > + */
> > + if (user_opts.include_slots)
> > + {
> > + start_postmaster(&new_cluster, true);
> > + create_replicaiton_slots();
> > + stop_postmaster(false);
> > + }
> > +
> >
> > I don't think that warrants a "XXX" style comment. It is just a "Note:".
>
> Fixed. Could you please tell me the classification of them if you can?

Hopefully, someone will correct me if this explanation is wrong, but
my understanding of the different prefixes is like this --

"XXX" is used as a marker for future developers to consider maybe
revisiting/improving something that the comment refers to
e.g.
/* XXX - it would be better to code this using blah but for now we did
not.... */
/* XXX - option 'foo' is not currently supported but... */
/* XXX - it might be worth considering adding more checks or an assert
here because... */

OTOH, "Note" is just for highlighting why something is the way it is,
but with no implication that it should be revisited/changed in the
future.
e.g.
/* Note: We deliberately do not test the state here because... */
/* Note: This memory must be zeroed because... */
/* Note: This string has no '\0' terminator so... */

------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

11 апреля 2023 г., 13:27:08

Dear Peter,

Thank you for giving comments! PSA new version.

> ======
> doc/src/sgml/ref/pgupgrade.sgml
> 
> 1.
> +     <varlistentry>
> +      <term><option>--include-logical-replication-slots</option></term>
> +      <listitem>
> +       <para>
> +        Upgrade logical replication slots. Only permanent replication slots
> +        included. Note that pg_upgrade does not check the installation of
> +        plugins.
> +       </para>
> +      </listitem>
> +     </varlistentry>
> 
> Missing word.
> 
> "Only permanent replication slots included." --> "Only permanent
> replication slots are included."

Fixed.

> ======
> src/bin/pg_dump/pg_dump.c
> 
> 2. help
> 
> @@ -1119,6 +1145,8 @@ help(const char *progname)
>   printf(_("  --no-unlogged-table-data     do not dump unlogged table
> data\n"));
>   printf(_("  --on-conflict-do-nothing     add ON CONFLICT DO NOTHING
> to INSERT commands\n"));
>   printf(_("  --quote-all-identifiers      quote all identifiers, even
> if not key words\n"));
> + printf(_("  --logical-replication-slots-only\n"
> + "                               dump only logical replication slots,
> no schema or data\n"));
>   printf(_("  --rows-per-insert=NROWS      number of rows per INSERT;
> implies --inserts\n"));
> A previous review comment ([1] #11b) seems to have been missed. This
> help is misplaced. It should be in alphabetical order consistent with
> all the other help.

Sorry, fixed.

> src/bin/pg_dump/pg_dump.h
> 
> 3. _LogicalReplicationSlotInfo
> 
> +/*
> + * The LogicalReplicationSlotInfo struct is used to represent replication
> + * slots.
> + * XXX: add more attrbutes if needed
> + */
> +typedef struct _LogicalReplicationSlotInfo
> +{
> + DumpableObject dobj;
> + char    *plugin;
> + char    *slottype;
> + char    *twophase;
> +} LogicalReplicationSlotInfo;
> +
> 
> 4a.
> The indent of the 'LogicalReplicationSlotInfo' looks a bit strange,
> unlike others in this file. Is it OK?

I was betrayed by pgindent because of the reason you pointed out.
Fixed.

> 4b.
> There was no typedefs.list file in this patch. Maybe the above
> whitespace problem is a result of that omission.

Your analysis is correct. Added.

> .../pg_upgrade/t/003_logical_replication.pl
> 
> 5.
> 
> +# Run pg_upgrade. pg_upgrade_output.d is removed at the end
> +command_ok(
> + [
> + 'pg_upgrade', '--no-sync',
> + '-d',         $old_publisher->data_dir,
> + '-D',         $new_publisher->data_dir,
> + '-b',         $bindir,
> + '-B',         $bindir,
> + '-s',         $new_publisher->host,
> + '-p',         $old_publisher->port,
> + '-P',         $new_publisher->port,
> + $mode,        '--include-logical-replication-slot'
> + ],
> + 'run of pg_upgrade for new publisher');
> 
> 5a.
> How can this test even be working as-expected with those options?
> 
> Here it is passing option '--include-logical-replication-slot' but
> AFAIK the proper option name everywhere else in this patch is
> '--include-logical-replication-slots' (with the 's')

This is because getopt_long implemented by GNU can accept incomplete options if
collect one can be predicted from input. E.g. pg_upgrade on linux can accept
`--ve` as `--verbose`, whereas the binary built on Windows cannot.

Anyway, the difference was not my expectation. Fixed.

> 5b.
> I'm not sure that "pg_upgrade for new publisher" makes sense.
> 
> It's more like "pg_upgrade of old publisher", or simply "pg_upgrade of
> publisher"
>

Fixed.

Additionally, I fixed two bugs which are detected by AddressSanitizer.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v4-0001-pg_upgrade-Add-include-logical-replication-slots-.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

11 апреля 2023 г., 13:30:35

Dear Peter,

Thank you for giving explanation.

> 
> Hopefully, someone will correct me if this explanation is wrong, but
> my understanding of the different prefixes is like this --
> 
> "XXX" is used as a marker for future developers to consider maybe
> revisiting/improving something that the comment refers to
> e.g.
> /* XXX - it would be better to code this using blah but for now we did
> not.... */
> /* XXX - option 'foo' is not currently supported but... */
> /* XXX - it might be worth considering adding more checks or an assert
> here because... */
> 
> OTOH, "Note" is just for highlighting why something is the way it is,
> but with no implication that it should be revisited/changed in the
> future.
> e.g.
> /* Note: We deliberately do not test the state here because... */
> /* Note: This memory must be zeroed because... */
> /* Note: This string has no '\0' terminator so... */

I confirmed that current "XXX" comments must be removed by improving
or some decision. Therefore I kept current annotation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

11 апреля 2023 г., 13:40:54

Dear hackers,

My PoC does not read and copy logical mappings files to new node, but I
did not analyzed in detail whether it is correct. Now I have done this and
considered that they do not have to be copied because transactions which executed
at the same time as rewriting are no longer decoded. How do you think?
Followings my analysis.

## What is logical mappings files?

Logical mappings file is used to track the system catalogs while logical decoding
even if its heap file is written. Sometimes catalog heaps files are modified, or
completely replaced to new files via VACUUM FULL or CLUSTER, but reorder buffer
cannot not track new one as-is. Mappings files allow to do them.

The file contains key-value relations for old-to-new tuples. Also, the name of
files contains the LSN where the triggered event is happen.

Mappings files are needed when transactions which modify catalogs are decoded.
If the LSN of files are older than restart_lsn, they are no longer needed then
removed. Please see CheckPointLogicalRewriteHeap().

## Is it needed?

I think this is not needed.
Currently pg_upgrade dumps important information from old publisher and then
execute pg_create_logical_replication_slot() on new one. Apart from
pg_copy_logical_replication_slot(), retart_lsn and confirmed_flush_lsn for old
slot is not taken over to the new slot. They are recalculated on new node while
creating. This means that transactions which have modified catalog heaps on the
old publisher are no longer decoded on new publisher.

Therefore, the mappings files on old publisher are not needed for new one.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

12 апреля 2023 г., 08:25:31

FYI, here are some minor review comments for v4-0001

======
src/bin/pg_dump/pg_backup.h

1.
+ int logical_slot_only;

The field should be plural - "logical_slots_only"

======
src/bin/pg_dump/pg_dump.c

2.
+ appendPQExpBufferStr(query,
+ "SELECT r.slot_name, r.plugin, r.two_phase "
+ "FROM pg_replication_slots r "
+ "WHERE r.database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

The alias 'r' may not be needed at all here, but since you already
have it IMO it looks a bit strange that you used it for only some of
the columns but not others.

~~~

3.
+
+ /* FIXME: force dumping */
+ slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;

Why the "FIXME" here? Are you intending to replace this code with
something else?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

12 апреля 2023 г., 10:55:28

Dear Peter,

Thank you for giving comments. PSA new version.

> src/bin/pg_dump/pg_backup.h
> 
> 1.
> + int logical_slot_only;
> 
> The field should be plural - "logical_slots_only"

Fixed.

> src/bin/pg_dump/pg_dump.c
> 
> 2.
> + appendPQExpBufferStr(query,
> + "SELECT r.slot_name, r.plugin, r.two_phase "
> + "FROM pg_replication_slots r "
> + "WHERE r.database = current_database() AND temporary = false "
> + "AND wal_status IN ('reserved', 'extended');");
> 
> The alias 'r' may not be needed at all here, but since you already
> have it IMO it looks a bit strange that you used it for only some of
> the columns but not others.

Right, I removed alias. Moreover, the namespace 'pg_catalog' is now specified.

> 3.
> +
> + /* FIXME: force dumping */
> + slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
> 
> Why the "FIXME" here? Are you intending to replace this code with
> something else?

I was added FIXME because I was not sure whether we must add selectDumpable...()
function was needed or not. Now I have been thinking that such a functions are not
needed, so replaced comments. More detail, please see following:

Replication slots cannot be a member of extension because pg_create_logical_replication_slot()
cannot be called within the install script. This means that checkExtensionMembership()
is not needed. Moreover, we do not have have any options to include/exclude slots
in dumping, so checking their name like selectDumpableExtension() is not needed.
Based on them, I think the function is not needed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v5-0001-pg_upgrade-Add-include-logical-replication-slots-.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

13 апреля 2023 г., 12:52:33

Hi Kuroda-san.

I do not have any more review comments for the v5 patch, but here are
a few remaining nitpick items.

======
General

1.
There were a couple of comments that I thought would appear less
squished (aka more readable) if there was a blank line preceding the
XXX.

1a. This one is in getLogicalReplicationSlots

+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ * XXX: Do we have to support physical slots?
+ */

~

1b. This one is for the LogicalReplicationSlotInfo typedef

+/*
+ * The LogicalReplicationSlotInfo struct is used to represent replication
+ * slots.
+ * XXX: add more attrbutes if needed
+ */

BTW -- I just noticed there is a typo in that comment. /attrbutes/attributes/

======
src/bin/pg_dump/pg_dump_sort.c

2. describeDumpableObject

+ case DO_LOGICAL_REPLICATION_SLOT:
+ snprintf(buf, bufsize,
+ "REPLICATION SLOT (ID %d NAME %s)",
+ obj->dumpId, obj->name);
+ return;

Since everything else was changed to say logical replication slot,
should this string be changed to "LOGICAL REPLICATION SLOT (ID %d NAME
%s)"?

======
.../pg_upgrade/t/003_logical_replication.pl

3.
Should the name of this TAP test file really be 03_logical_replication_slots.pl?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

14 апреля 2023 г., 08:53:37

Dear Peter,

Thank you for checking. Then we can wait comments from others.
PSA modified version.

> 1.
> There were a couple of comments that I thought would appear less
> squished (aka more readable) if there was a blank line preceding the
> XXX.
> 
> 1a. This one is in getLogicalReplicationSlots
> 
> + /*
> + * Get replication slots.
> + *
> + * XXX: Which information must be extracted from old node? Currently three
> + * attributes are extracted because they are used by
> + * pg_create_logical_replication_slot().
> + * XXX: Do we have to support physical slots?
> + */

Added.

> 1b. This one is for the LogicalReplicationSlotInfo typedef
> 
> +/*
> + * The LogicalReplicationSlotInfo struct is used to represent replication
> + * slots.
> + * XXX: add more attrbutes if needed
> + */

Added.

> BTW -- I just noticed there is a typo in that comment. /attrbutes/attributes/

Good finding, replaced.

> src/bin/pg_dump/pg_dump_sort.c
> 
> 2. describeDumpableObject
> 
> + case DO_LOGICAL_REPLICATION_SLOT:
> + snprintf(buf, bufsize,
> + "REPLICATION SLOT (ID %d NAME %s)",
> + obj->dumpId, obj->name);
> + return;
> 
> Since everything else was changed to say logical replication slot,
> should this string be changed to "LOGICAL REPLICATION SLOT (ID %d NAME
> %s)"?

I missed to replace, changed.

> .../pg_upgrade/t/003_logical_replication.pl
> 
> 3.
> Should the name of this TAP test file really be 03_logical_replication_slots.pl?
>

Hmm, not sure. Currently I renamed once according to your advice, but personally
another feature which allows to upgrade subscriber[1] should be tested in the same
perl file. That's why I named as "003_logical_replication.pl"

[1]: https://www.postgresql.org/message-id/20230217075433.u5mjly4d5cr4hcfe%40jrouhaud


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v6-0001-pg_upgrade-Add-include-logical-replication-slots-.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Julien Rouhaud

Дата:

14 апреля 2023 г., 09:12:48

Hi,

Sorry for the delay, I didn't had time to come back to it until this afternoon.

On Mon, Apr 10, 2023 at 09:18:46AM +0000, Hayato Kuroda (Fujitsu) wrote:
>
> I have analyzed about the point but it seemed to be difficult. This is because
> some additional records like followings may be inserted. PSA the script which is
> used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong,
> so I wanted to withdraw it once. Sorry for noise.
>
> * HEAP/HEAP2 records. These records may be inserted by checkpointer.
>
> IIUC, if there are tuples which have not been flushed yet when shutdown is requested,
> the checkpointer writes back all of them into heap file. At that time many WAL
> records are generated. I think we cannot predict the number of records beforehand.
>
> * INVALIDATION(S) records. These records may be inserted by VACUUM.
>
> There is a possibility that autovacuum runs and generate WAL records. I think we
> cannot predict the number of records beforehand because it depends on the number
> of objects.
>
> * RUNNING_XACTS record
>
> It might be a timing issue, but I found that sometimes background writer generated
> a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it will be
> generated when the process spends 15 seconds since last logging and there are
> important records. I think it is difficult to predict whether this will be appeared or not.

I don't think that your analysis is correct.  Slots are guaranteed to be
stopped after all the normal backends have been stopped, exactly to avoid such
extraneous records.

What is happening here is that the slot's confirmed_flush_lsn is properly
updated in memory and ends up being the same as the current LSN before the
shutdown.  But as it's a logical slot and those records aren't decoded, the
slot isn't marked as dirty and therefore isn't saved to disk.  You don't see
that behavior when doing a manual checkpoint before (per your script comment),
as in that case the checkpoint also tries to save the slot to disk but then
finds a slot that was marked as dirty and therefore saves it.

In your script's scenario, when you restart the server the previous slot data
is restored and the confirmed_flush_lsn goes backward, which explains those
extraneous records.

It's probably totally harmless to throw away that value for now (and probably
also doesn't lead to crazy amount of work after restart, I really don't know
much about the logical slot code), but clearly becomes problematic with your
usecase.  One easy way to fix this is to teach the checkpoint code to force
saving the logical slots to disk even if they're not marked as dirty during a
shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
interfere with the cfbot).  With this patch applied I reliably only see a final
shutdown checkpoint record with your scenario.

Now such a change will make shutdown a bit more expensive when using logical
replication, even if in 99% of cases you will not need to save the
confirmed_flush_lsn value, so I don't know if that's acceptable or not.

Вложения

v1-0001-Always-persist-to-disk-logical-slots-during-a-shu.patch.txt

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

14 апреля 2023 г., 13:30:27

Dear Julien,

> Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

> I don't think that your analysis is correct.  Slots are guaranteed to be
> stopped after all the normal backends have been stopped, exactly to avoid such
> extraneous records.
>
> What is happening here is that the slot's confirmed_flush_lsn is properly
> updated in memory and ends up being the same as the current LSN before the
> shutdown.  But as it's a logical slot and those records aren't decoded, the
> slot isn't marked as dirty and therefore isn't saved to disk.  You don't see
> that behavior when doing a manual checkpoint before (per your script comment),
> as in that case the checkpoint also tries to save the slot to disk but then
> finds a slot that was marked as dirty and therefore saves it.
>
> In your script's scenario, when you restart the server the previous slot data
> is restored and the confirmed_flush_lsn goes backward, which explains those
> extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

> It's probably totally harmless to throw away that value for now (and probably
> also doesn't lead to crazy amount of work after restart, I really don't know
> much about the logical slot code), but clearly becomes problematic with your
> usecase.  One easy way to fix this is to teach the checkpoint code to force
> saving the logical slots to disk even if they're not marked as dirty during a
> shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
> interfere with the cfbot).  With this patch applied I reliably only see a final
> shutdown checkpoint record with your scenario.
>
> Now such a change will make shutdown a bit more expensive when using logical
> replication, even if in 99% of cases you will not need to save the
> confirmed_flush_lsn value, so I don't know if that's acceptable or not.

In any case we these records must be advanced. IIUC, currently such records are
read after rebooting but ingored, and this patch just skips them. I have not measured,
but there is a possibility that is not additional overhead, just a trade-off.

Currently I did not come up with another solution, so I have included your patch.
Please see 0002.

Additionally, I added a checking functions in 0003.
According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
Therefore, the function ensures that the difference between current insert position
and confirmed_flush_lsn is less than (above + page header).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Vignesh,

Thank you for reviewing! PSA new patchset.

> > Additionally, I added a checking functions in 0003
> > According to pg_resetwal and other functions, the length of
> CHECKPOINT_SHUTDOWN
> > record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort +
> sizeof(CheckPoint)).
> > Therefore, the function ensures that the difference between current insert
> position
> > and confirmed_flush_lsn is less than (above + page header).
> 
> Logical replication slots can be created only if wal_level >= logical,
> currently we do not have any check to see if wal_level >= logical if
> "--include-logical-replication-slots" option is specified. If
> include-logical-replication-slots is specified with pg_upgrade, we
> will be creating replication slots after a lot of steps like
> performing prechecks, analyzing, freezing, deleting, restoring,
> copying, setting related objects and then create replication slot,
> where we will be erroring out after a lot of time(Many cases
> pg_upgrade takes a lot of hours to perform these operations). I feel
> it would be better to add a check in the beginning itself somewhere in
> check_new_cluster to see if wal_level is set appropriately in case of
> include-logical-replication-slot option to detect and throw an error
> early itself.

I see your point. Moreover, I think max_replication_slots != 0 must be also checked.
I added a checking function and related test in 0001.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Julien,

Thank you for giving comments! PSA new version.

> I think that this test should be different when just checking for the
> prerequirements (live_check / --check) compared to actually doing the upgrade,
> as it's almost guaranteed that the slots won't have sent everything when the
> source server is up and running.

Hmm, you assumed that the user application is still running and data is coming
continuously when doing --check, right? Personally I have thought that the
--check operation is executed just before the actual upgrading, therefore I'm not
sure your assumption is real problem. And I could not find any checks which their
contents are changed based on the --check option.

Anyway, I included your opinion in 0004 patch. We can ask some other reviewers
about the necessity.

> Maybe simply check that all logical slots are currently active when running the
> live check,

Yeah, if we support the case checking pg_replication_slots.active may be sufficient.
Actually this cannot handle the case that pg_create_logical_replication_slot()
is executed just before upgrading, but I'm not sure it should be.

> so at least there's a good chance that they will still be at
> shutdown, and will therefore send all the data to the subscribers? Having a
> regression tests for that scenario would also be a good idea.  Having an
> uncommitted write transaction should be enough to cover it.

I think background_psql() can be used for the purpose. Before doing pg_upgrade
--check, a transaction is opened and kept. It means that the confirmed_flush has
been not reached to the current WAL position yet, but the checking says OK
because all slots are active.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

> A suggestion: You could write some/most tests against test_decoding
> rather than the publication/subscription system.  That way, you can
> avoid many timing issues in the tests and you can check more exactly
> that the slots produce the output you want.  This would also help ensure
> that this new facility works for other logical decoding output plugins
> besides the built-in one.

Good point. I think almost tests except --check part can be rewritten.
PSA new patchset.

Additionally, I fixed followings:

- Added initialization for slot_arr.*. This is needed to check whether 
  the entry has already been allocated, in get_logical_slot_infos().
  Previously double-free was occurred in some platform.
- fixed condition in get_logical_slot_infos()
- Changed the expected size of page header to longer one(SizeOfXLogLongPHD).
  If the WAL page is the first one in the WAL segment file, the long header seems
  to be used.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Tue, 2 May 2023, 19:43 Julien Rouhaud, <rjuju123@gmail.com> wrote:

Hi,

On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:
> On 2023-Apr-07, Julien Rouhaud wrote:
>
> > That being said, I have a hard time believing that we could actually preserve
> > physical replication slots. I don't think that pg_upgrade final state is fully
> > reproducible: not all object oids are preserved, and the various pg_restore
> > are run in parallel so you're very likely to end up with small physical
> > differences that would be incompatible with physical replication. Even if we
> > could make it totally reproducible, it would probably be at the cost of making
> > pg_upgrade orders of magnitude slower. And since many people are already
> > complaining that it's too slow, that doesn't seem like something we would want.
>
> A point on preserving physical replication slots: because we change WAL
> format from one major version to the next (adding new messages or
> changing format for other messages), we can't currently rely on physical
> slots working across different major versions.

I don't think anyone suggested to do physical replication over different major
versions. My understanding was that it would be used to pg_upgrade a
"physical cluster" (e.g. a primary and physical standby server) at the same
time, and then simply starting them up again would lead to a working physical
replication on the new version.

I guess one could try to keep using the slots for other needs (PITR backup with
pg_receivewal or something similar), and then you would indeed have to be aware
that you won't be able to do anything with the new WAL records until you do a
fresh base backup, but that's a problem that you can already face after a
normal pg_upgrade (although in most cases it's probably quite obvious for now
as the timeline isn't preserved).

if what you meant is that the slot may have to send a record generated by an older major version, then unless I'm missing something the same restriction could be added to such a feature as what's being discussed in this thread for the logical replication slots. so only a final shutdown checkpoint record would be present after the flushed WAL position. it may be possible to work around that, if there weren't all the other problems I mentioned.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Alvaro Herrera

Дата:

03 мая 2023 г., 13:40:34

On 2023-May-02, Julien Rouhaud wrote:

> On Tue, May 02, 2023 at 12:55:18PM +0200, Alvaro Herrera wrote:

> > A point on preserving physical replication slots: because we change WAL
> > format from one major version to the next (adding new messages or
> > changing format for other messages), we can't currently rely on physical
> > slots working across different major versions.
> 
> I don't think anyone suggested to do physical replication over different major
> versions.

They didn't, but a man can dream.  (Anyway, we agree on it not working
for various reasons.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

04 мая 2023 г., 07:03:55

Dear Alvaro,

Thanks for giving suggestion!

> A point on preserving physical replication slots: because we change WAL
> format from one major version to the next (adding new messages or
> changing format for other messages), we can't currently rely on physical
> slots working across different major versions.
> 
> So IMO, for now don't bother with physical replication slot
> preservation, but do keep the option name as specific to logical slots.

Based on the Julien's advice, We have already decided not to include physical
slots in this patch and the option name has been changed.
I think you said explicitly that we are going correct way. Thanks!

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

09 мая 2023 г., 06:09:49

Hi Kuroda-san. Here are some review comments for the v10-0001 patch.

======

General.

1. pg_dump option is documented to the user.

I'm not sure about exposing the new pg_dump
--logical-replication-slots-only option to the user.

I thought this pg_dump option was intended only to be called
*internally* by the pg_upgrade.
But, this patch is also documenting the new option for the user (in
case they want to call it independently?)

Maybe exposing it  is OK, but if you do that then I thought perhaps
there should also be some additional pg_dump tests just for this
option (i.e. tested independently of the pg_upgrade)

======
Commit message

2.
For pg_upgrade, when '--include-logical-replication-slots' is
specified, it executes
pg_dump with the new "--logical-replication-slots-only" option and
restores from the
dump. Apart from restoring schema, pg_resetwal must not be called
after restoring
replication slots. This is because the command discards WAL files and
starts from a
new segment, even if they are required by replication slots. This
leads to an ERROR:
"requested WAL segment XXX has already been removed". To avoid this,
replication slots
are restored at a different time than other objects, after running pg_resetwal.

~~

The "Apart from" sentence maybe could do with some rewording. I
noticed there is a code comment (below fragment) that says the same as
this, but more clearly. Maybe it is better to use that code-comment
wording in the comment message.

+ * XXX We cannot dump replication slots at the same time as the schema
+ * dump because we need to separate the timing of restoring
+ * replication slots and other objects. Replication slots, in
+ * particular, should not be restored before executing the pg_resetwal
+ * command because it will remove WALs that are required by the slots.

======
src/bin/pg_dump/pg_dump.c

3. main

+ if (dopt.logical_slots_only && !dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.logical_slots_only && dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+ if (dopt.logical_slots_only && dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
+

Consider if it might be simpler to combine together all those
dopt.logical_slots_only checks.

SUGGESTION

if (dopt.logical_slots_only)
{
    if (!dopt.binary_upgrade)
        pg_fatal("options --logical-replication-slots-only requires
option --binary-upgrade");

    if (dopt.dataOnly)
        pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
    if (dopt.schemaOnly)
        pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");
}

~~~

4. getLogicalReplicationSlots

+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
+ return;

I'm not sure if this check is necessary. Given the way this function
is called, is it possible for this check to fail? Maybe that quick
exit would be better code as an Assert?

~~~

5. dumpLogicalReplicationSlot

+dumpLogicalReplicationSlot(Archive *fout,
+    const LogicalReplicationSlotInfo *slotinfo)
+{
+ DumpOptions *dopt = fout->dopt;
+
+ if (!dopt->logical_slots_only)
+ return;

(Similar to the previous comment). Is it even possible to arrive here
when dopt->logical_slots_only is false. Maybe that quick exit would be
better coded as an Assert?

~

6.
+ PQExpBuffer query = createPQExpBuffer();
+ char    *slotname = pg_strdup(slotinfo->dobj.name);

I wondered if it was really necessary to strdup/free this slotname.
e.g. And if it is, then why don't you do this for the slotinfo->plugin
field?

======
src/bin/pg_upgrade/check.c

7. check_and_dump_old_cluster

  /* Extract a list of databases and tables from the old cluster */
  get_db_and_rel_infos(&old_cluster);
+ get_logical_slot_infos(&old_cluster);

Is it correct to associate this new call with that existing comment
about "databases and tables"?

~~~

8. check_new_cluster

@@ -188,6 +190,7 @@ void
 check_new_cluster(void)
 {
  get_db_and_rel_infos(&new_cluster);
+ get_logical_slot_infos(&new_cluster);

  check_new_cluster_is_empty();

@@ -210,6 +213,9 @@ check_new_cluster(void)
  check_for_prepared_transactions(&new_cluster);

  check_for_new_tablespace_dir(&new_cluster);
+
+ if (user_opts.include_logical_slots)
+ check_for_parameter_settings(&new_cluster);

Can the get_logical_slot_infos() be done later, guarded by that the
same condition if (user_opts.include_logical_slots)?

~~~

9. check_new_cluster_is_empty

+ * If --include-logical-replication-slots is required, check the
+ * existing of slots
+ */

Did you mean to say "check the existence of slots"?

~~~

10. check_for_parameter_settings

+ if (strcmp(wal_level, "logical") != 0)
+ pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
+ wal_level);

/but set to/but is set to/


======
src/bin/pg_upgrade/info.c

11. get_db_and_rel_infos

+ {
  get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);

+ /*
+ * Additionally, slot_arr must be initialized because they will be
+ * checked later.
+ */
+ cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0;
+ cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL;
+ }

11a.
I think probably it would have been easier to just use 'pg_malloc0'
instead of 'pg_malloc' in the get_db_infos, then this code would not
be necessary.

~

11b.
BTW, shouldn't this function also be calling free_logical_slot_infos()
too? That will also have the same effect (initializing the slot_arr)
but without having to change anything else.

~~~

12. get_logical_slot_infos
+/*
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)

To be consistent with the other nearby function headers it should have
another line saying just get_logical_slot_infos().

~~~

13. get_logical_slot_infos

+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
+ free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+
+ get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);
+ }
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ if (log_opts.verbose)
+ {
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
+ print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
+ }
+ }
+}

I didn't see why there are 2 loops exactly the same. I think with some
minor refactoring these can both be done in the same loop can't they?

SUGGESTION 1:

if (cluster == &old_cluster)
    pg_log(PG_VERBOSE, "\nsource databases:");
else
    pg_log(PG_VERBOSE, "\ntarget databases:");

for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
{
    if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
        free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);

    get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);

    if (log_opts.verbose)
    {
        pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
        print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
    }
}

~

I expected it could be simplified further still by using some variables

SUGGESTION 2:

if (cluster == &old_cluster)
    pg_log(PG_VERBOSE, "\nsource databases:");
else
    pg_log(PG_VERBOSE, "\ntarget databases:");

for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
{
DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
    if (pDbInfo->slot_arr.slots)
        free_logical_slot_infos(&pDbInfo->slot_arr);

    get_logical_slot_infos_per_db(cluster, pDbInfo);

    if (log_opts.verbose)
    {
        pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);
        print_slot_infos(&pDbInfo->slot_arr);
    }
}

~~~

14. get_logical_slot_infos_per_db

+ char query[QUERY_ALLOC];
+
+ query[0] = '\0'; /* initialize query string to empty */
+
+ snprintf(query + strlen(query), sizeof(query) - strlen(query),
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

I didn't understand the purpose of those calls to 'strlen(query)'
since the string was initialised to empty-string immediately above.

~~~

15.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ pg_log(PG_VERBOSE, "slotname: %s: plugin: %s: two_phase %d",
+    slot_arr->slots[slotnum].slotname,
+    slot_arr->slots[slotnum].plugin,
+    slot_arr->slots[slotnum].two_phase);
+}

IMO those colons don't make sense.

BEFORE
"slotname: %s: plugin: %s: two_phase %d"

SUGGESTION
"slotname: %s, plugin: %s, two_phase: %d"

======
src/bin/pg_upgrade/pg_upgrade.h

16. LogicalSlotInfo

+typedef struct
+{
+ char    *slotname; /* slot name */
+ char    *plugin; /* plugin */
+ bool two_phase; /* Can the slot decode 2PC? */
+} LogicalSlotInfo;

The RelInfo had a comment for the typedef struct, so I think the
LogicalSlotInfo struct also should have a comment.

~~~

17. DbInfo

  RelInfoArr rel_arr; /* array of all user relinfos */
+ LogicalSlotInfoArr slot_arr; /* array of all logicalslotinfos */
 } DbInfo;

Should the comment say "LogicalSlotInfo" instead of "logicalslotinfos"?

======
.../t/003_logical_replication_slots.pl

18. RESULTS

I run this by 'make check' in the src/bin/pg_upgrade folder.

For some reason, the test does not work for me. The results I get are:

# +++ tap check in src/bin/pg_upgrade +++
t/001_basic.pl ...................... ok
t/002_pg_upgrade.pl ................. ok
t/003_logical_replication_slots.pl .. 3/? # Tests were run but no plan
was declared and done_testing() was not seen.
t/003_logical_replication_slots.pl .. Dubious, test returned 29 (wstat
7424, 0x1d00)
All 4 subtests passed

Test Summary Report
-------------------
t/003_logical_replication_slots.pl (Wstat: 7424 Tests: 4 Failed: 0)
  Non-zero exit status: 29
  Parse errors: No plan found in TAP output
Files=3, Tests=27, 128 wallclock secs ( 0.04 usr  0.01 sys + 18.02
cusr  6.06 csys = 24.13 CPU)
Result: FAIL
make: *** [check] Error 1

~

And the log file
(tmp_check/log/003_logical_replication_slots_old_node.log) shows the
following ERROR:

2023-05-09 12:19:25.330 AEST [32572] 003_logical_replication_slots.pl
LOG:  statement: SELECT
pg_create_logical_replication_slot('test_slot', 'test_decoding',
false, true);
2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
ERROR:  could not access file "test_decoding": No such file or
directory
2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
STATEMENT:  SELECT pg_create_logical_replication_slot('test_slot',
'test_decoding', false, true);
2023-05-09 12:19:25.335 AEST [32564] LOG:  received immediate shutdown request
2023-05-09 12:19:25.337 AEST [32564] LOG:  database system is shut down

~

Is it a bug? Or, if I am doing something wrong please let me know how
to run the test.

~~~

19.
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");

I think the last 2 lines are not "clean up". They are preparations for
the subsequent test, so maybe they should be commented as such.

~~~

20.
+# Clean up
+rmtree($new_node->data_dir . "/pg_upgrade_output.d");
+$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");

I think the last line is not "clean up". It is preparation for the
subsequent test, so maybe it should be commented as such.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

09 мая 2023 г., 12:43:35

Dear Peter,

Thank you for reviewing! PSA new version.

> 
> General.
> 
> 1. pg_dump option is documented to the user.
> 
> I'm not sure about exposing the new pg_dump
> --logical-replication-slots-only option to the user.
> 
> I thought this pg_dump option was intended only to be called
> *internally* by the pg_upgrade.
> But, this patch is also documenting the new option for the user (in
> case they want to call it independently?)
> 
> Maybe exposing it  is OK, but if you do that then I thought perhaps
> there should also be some additional pg_dump tests just for this
> option (i.e. tested independently of the pg_upgrade)

Right, I have written the document for the moment, but it should not
If it is not exposed. Removed from the doc.

> Commit message
> 
> 2.
> For pg_upgrade, when '--include-logical-replication-slots' is
> specified, it executes
> pg_dump with the new "--logical-replication-slots-only" option and
> restores from the
> dump. Apart from restoring schema, pg_resetwal must not be called
> after restoring
> replication slots. This is because the command discards WAL files and
> starts from a
> new segment, even if they are required by replication slots. This
> leads to an ERROR:
> "requested WAL segment XXX has already been removed". To avoid this,
> replication slots
> are restored at a different time than other objects, after running pg_resetwal.
> 
> ~~
> 
> The "Apart from" sentence maybe could do with some rewording. I
> noticed there is a code comment (below fragment) that says the same as
> this, but more clearly. Maybe it is better to use that code-comment
> wording in the comment message.
> 
> + * XXX We cannot dump replication slots at the same time as the schema
> + * dump because we need to separate the timing of restoring
> + * replication slots and other objects. Replication slots, in
> + * particular, should not be restored before executing the pg_resetwal
> + * command because it will remove WALs that are required by the slots.

Changed.

> src/bin/pg_dump/pg_dump.c
> 
> 3. main
> 
> + if (dopt.logical_slots_only && !dopt.binary_upgrade)
> + pg_fatal("options --logical-replication-slots-only requires option
> --binary-upgrade");
> +
> + if (dopt.logical_slots_only && dopt.dataOnly)
> + pg_fatal("options --logical-replication-slots-only and
> -a/--data-only cannot be used together");
> + if (dopt.logical_slots_only && dopt.schemaOnly)
> + pg_fatal("options --logical-replication-slots-only and
> -s/--schema-only cannot be used together");
> +
> 
> Consider if it might be simpler to combine together all those
> dopt.logical_slots_only checks.
> 
> SUGGESTION
> 
> if (dopt.logical_slots_only)
> {
>     if (!dopt.binary_upgrade)
>         pg_fatal("options --logical-replication-slots-only requires
> option --binary-upgrade");
> 
>     if (dopt.dataOnly)
>         pg_fatal("options --logical-replication-slots-only and
> -a/--data-only cannot be used together");
>     if (dopt.schemaOnly)
>         pg_fatal("options --logical-replication-slots-only and
> -s/--schema-only cannot be used together");
> }

Right, fixed.

> 4. getLogicalReplicationSlots
> 
> + /* Check whether we should dump or not */
> + if (fout->remoteVersion < 160000 || !dopt->logical_slots_only)
> + return;
> 
> I'm not sure if this check is necessary. Given the way this function
> is called, is it possible for this check to fail? Maybe that quick
> exit would be better code as an Assert?

I think the version check must be needed because it is not done yet.
(Actually I'm not sure the restriction is needed, but now I will keep)
About dopt->logical_slots_only, I agreed to remove that. 

> 5. dumpLogicalReplicationSlot
> 
> +dumpLogicalReplicationSlot(Archive *fout,
> +    const LogicalReplicationSlotInfo *slotinfo)
> +{
> + DumpOptions *dopt = fout->dopt;
> +
> + if (!dopt->logical_slots_only)
> + return;
> 
> (Similar to the previous comment). Is it even possible to arrive here
> when dopt->logical_slots_only is false. Maybe that quick exit would be
> better coded as an Assert?

I think it is not possible, so changed to Assert().

> 6.
> + PQExpBuffer query = createPQExpBuffer();
> + char    *slotname = pg_strdup(slotinfo->dobj.name);
> 
> I wondered if it was really necessary to strdup/free this slotname.
> e.g. And if it is, then why don't you do this for the slotinfo->plugin
> field?

This was a debris for my testing. Removed.

> src/bin/pg_upgrade/check.c
> 
> 7. check_and_dump_old_cluster
> 
>   /* Extract a list of databases and tables from the old cluster */
>   get_db_and_rel_infos(&old_cluster);
> + get_logical_slot_infos(&old_cluster);
> 
> Is it correct to associate this new call with that existing comment
> about "databases and tables"?

Added a comment.

> 8. check_new_cluster
> 
> @@ -188,6 +190,7 @@ void
>  check_new_cluster(void)
>  {
>   get_db_and_rel_infos(&new_cluster);
> + get_logical_slot_infos(&new_cluster);
> 
>   check_new_cluster_is_empty();
> 
> @@ -210,6 +213,9 @@ check_new_cluster(void)
>   check_for_prepared_transactions(&new_cluster);
> 
>   check_for_new_tablespace_dir(&new_cluster);
> +
> + if (user_opts.include_logical_slots)
> + check_for_parameter_settings(&new_cluster);
> 
> Can the get_logical_slot_infos() be done later, guarded by that the
> same condition if (user_opts.include_logical_slots)?

Added.

> 9. check_new_cluster_is_empty
> 
> + * If --include-logical-replication-slots is required, check the
> + * existing of slots
> + */
> 
> Did you mean to say "check the existence of slots"?

Yes, it is my typo. Fixed.

> 10. check_for_parameter_settings
> 
> + if (strcmp(wal_level, "logical") != 0)
> + pg_fatal("wal_level must be \"logical\", but set to \"%s\"",
> + wal_level);
> 
> /but set to/but is set to/

Fixed.

> src/bin/pg_upgrade/info.c
> 
> 11. get_db_and_rel_infos
> 
> + {
>   get_rel_infos(cluster, &cluster->dbarr.dbs[dbnum]);
> 
> + /*
> + * Additionally, slot_arr must be initialized because they will be
> + * checked later.
> + */
> + cluster->dbarr.dbs[dbnum].slot_arr.nslots = 0;
> + cluster->dbarr.dbs[dbnum].slot_arr.slots = NULL;
> + }
> 
> 11a.
> I think probably it would have been easier to just use 'pg_malloc0'
> instead of 'pg_malloc' in the get_db_infos, then this code would not
> be necessary.

I was not sure whether it is OK to change like that because of the
performance efficiency. But OK, fixed.

> 11b.
> BTW, shouldn't this function also be calling free_logical_slot_infos()
> too? That will also have the same effect (initializing the slot_arr)
> but without having to change anything else.
> 
> ~~~
> 
> 12. get_logical_slot_infos
> +/*
> + * Higher level routine to generate LogicalSlotInfoArr for all databases.
> + */
> +void
> +get_logical_slot_infos(ClusterInfo *cluster)
> 
> To be consistent with the other nearby function headers it should have
> another line saying just get_logical_slot_infos().

Added.

> 13. get_logical_slot_infos
> 
> +void
> +get_logical_slot_infos(ClusterInfo *cluster)
> +{
> + int dbnum;
> +
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + {
> + if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
> + free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
> +
> + get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);
> + }
> +
> + if (cluster == &old_cluster)
> + pg_log(PG_VERBOSE, "\nsource databases:");
> + else
> + pg_log(PG_VERBOSE, "\ntarget databases:");
> +
> + if (log_opts.verbose)
> + {
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + {
> + pg_log(PG_VERBOSE, "Database: %s", cluster->dbarr.dbs[dbnum].db_name);
> + print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
> + }
> + }
> +}
> 
> I didn't see why there are 2 loops exactly the same. I think with some
> minor refactoring these can both be done in the same loop can't they?

The style follows get_db_and_rel_infos(), but... 

> SUGGESTION 1:
> 
> if (cluster == &old_cluster)
>     pg_log(PG_VERBOSE, "\nsource databases:");
> else
>     pg_log(PG_VERBOSE, "\ntarget databases:");
> 
> for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> {
>     if (cluster->dbarr.dbs[dbnum].slot_arr.slots)
>         free_logical_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
> 
>     get_logical_slot_infos_per_db(cluster, &cluster->dbarr.dbs[dbnum]);
> 
>     if (log_opts.verbose)
>     {
>         pg_log(PG_VERBOSE, "Database: %s",
> cluster->dbarr.dbs[dbnum].db_name);
>         print_slot_infos(&cluster->dbarr.dbs[dbnum].slot_arr);
>     }
> }
> 
> ~
> 
> I expected it could be simplified further still by using some variables
> 
> SUGGESTION 2:
> 
> if (cluster == &old_cluster)
>     pg_log(PG_VERBOSE, "\nsource databases:");
> else
>     pg_log(PG_VERBOSE, "\ntarget databases:");
> 
> for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> {
> DbInfo *pDbInfo = &cluster->dbarr.dbs[dbnum];
>     if (pDbInfo->slot_arr.slots)
>         free_logical_slot_infos(&pDbInfo->slot_arr);
> 
>     get_logical_slot_infos_per_db(cluster, pDbInfo);
> 
>     if (log_opts.verbose)
>     {
>         pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);
>         print_slot_infos(&pDbInfo->slot_arr);
>     }
> }

I chose SUGGESTION 2.

> 14. get_logical_slot_infos_per_db
> 
> + char query[QUERY_ALLOC];
> +
> + query[0] = '\0'; /* initialize query string to empty */
> +
> + snprintf(query + strlen(query), sizeof(query) - strlen(query),
> + "SELECT slot_name, plugin, two_phase "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE database = current_database() AND temporary = false "
> + "AND wal_status IN ('reserved', 'extended');");
> 
> I didn't understand the purpose of those calls to 'strlen(query)'
> since the string was initialised to empty-string immediately above.

Removed.

> 15.
> +static void
> +print_slot_infos(LogicalSlotInfoArr *slot_arr)
> +{
> + int slotnum;
> +
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + pg_log(PG_VERBOSE, "slotname: %s: plugin: %s: two_phase %d",
> +    slot_arr->slots[slotnum].slotname,
> +    slot_arr->slots[slotnum].plugin,
> +    slot_arr->slots[slotnum].two_phase);
> +}
> 
> IMO those colons don't make sense.
> 
> BEFORE
> "slotname: %s: plugin: %s: two_phase %d"
> 
> SUGGESTION
> "slotname: %s, plugin: %s, two_phase: %d"

Fixed. I followed print_rel_infos() style, but I prefer yours.

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 16. LogicalSlotInfo
> 
> +typedef struct
> +{
> + char    *slotname; /* slot name */
> + char    *plugin; /* plugin */
> + bool two_phase; /* Can the slot decode 2PC? */
> +} LogicalSlotInfo;
> 
> The RelInfo had a comment for the typedef struct, so I think the
> LogicalSlotInfo struct also should have a comment.

Added.

> 17. DbInfo
> 
>   RelInfoArr rel_arr; /* array of all user relinfos */
> + LogicalSlotInfoArr slot_arr; /* array of all logicalslotinfos */
>  } DbInfo;
> 
> Should the comment say "LogicalSlotInfo" instead of "logicalslotinfos"?

Right, fixed.

> .../t/003_logical_replication_slots.pl
> 
> 18. RESULTS
> 
> I run this by 'make check' in the src/bin/pg_upgrade folder.
> 
> For some reason, the test does not work for me. The results I get are:
> 
> # +++ tap check in src/bin/pg_upgrade +++
> t/001_basic.pl ...................... ok
> t/002_pg_upgrade.pl ................. ok
> t/003_logical_replication_slots.pl .. 3/? # Tests were run but no plan
> was declared and done_testing() was not seen.
> t/003_logical_replication_slots.pl .. Dubious, test returned 29 (wstat
> 7424, 0x1d00)
> All 4 subtests passed
> 
> Test Summary Report
> -------------------
> t/003_logical_replication_slots.pl (Wstat: 7424 Tests: 4 Failed: 0)
>   Non-zero exit status: 29
>   Parse errors: No plan found in TAP output
> Files=3, Tests=27, 128 wallclock secs ( 0.04 usr  0.01 sys + 18.02
> cusr  6.06 csys = 24.13 CPU)
> Result: FAIL
> make: *** [check] Error 1
> 
> ~
> 
> And the log file
> (tmp_check/log/003_logical_replication_slots_old_node.log) shows the
> following ERROR:
> 
> 2023-05-09 12:19:25.330 AEST [32572] 003_logical_replication_slots.pl
> LOG:  statement: SELECT
> pg_create_logical_replication_slot('test_slot', 'test_decoding',
> false, true);
> 2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
> ERROR:  could not access file "test_decoding": No such file or
> directory
> 2023-05-09 12:19:25.331 AEST [32572] 003_logical_replication_slots.pl
> STATEMENT:  SELECT pg_create_logical_replication_slot('test_slot',
> 'test_decoding', false, true);
> 2023-05-09 12:19:25.335 AEST [32564] LOG:  received immediate shutdown
> request
> 2023-05-09 12:19:25.337 AEST [32564] LOG:  database system is shut down
> 
> ~
> 
> Is it a bug? Or, if I am doing something wrong please let me know how
> to run the test.

Good point. I could not find the problem because I used meson build system.
When I used the traditional make, the ERROR could be reproduced. 
IIUC the problem was occurred the dependency between pg_upgrade and test_decoding
was not set in the Makefile. Hence, I added a variable EXTRA_INSTALL to Makefile in
order to clarify the dependency. This followed other directories like pg_basebackup.

> 19.
> +# Clean up
> +rmtree($new_node->data_dir . "/pg_upgrade_output.d");
> +$new_node->append_conf('postgresql.conf', "wal_level = 'logical'");
> +$new_node->append_conf('postgresql.conf', "max_replication_slots = 0");
> 
> I think the last 2 lines are not "clean up". They are preparations for
> the subsequent test, so maybe they should be commented as such.

Right, it is a preparation for the next. Added a comment.

> 20.
> +# Clean up
> +rmtree($new_node->data_dir . "/pg_upgrade_output.d");
> +$new_node->append_conf('postgresql.conf', "max_replication_slots = 10");
> 
> I think the last line is not "clean up". It is preparation for the
> subsequent test, so maybe it should be commented as such.

Added a comment.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Wang,

Thank you for reviewing! PSA new version.

> 1. In the function getLogicalReplicationSlots
> ```
> +        /*
> +         * Note: Currently we do not have any options to include/exclude
> slots
> +         * in dumping, so all the slots must be selected.
> +         */
> +        slotinfo[i].dobj.dump = DUMP_COMPONENT_ALL;
> ```
> I think currently we are only dumping the definition of logical replication
> slots. It seems better to set it as DUMP_COMPONENT_DEFINITION here.

Right. Actually it was harmless because another flags like DUMP_COMPONENT_DEFINITION
are not checked in dumpLogicalReplicationSlot(), but changed.

> 2. In the function dumpLogicalReplicationSlot
> ```
> +        ArchiveEntry(fout, slotinfo->dobj.catId, slotinfo->dobj.dumpId,
> +                     ARCHIVE_OPTS(.tag = slotname,
> +
>       .description = "REPLICATION SLOT",
> +                                  .section =
> SECTION_POST_DATA,
> +
>       .createStmt = query->data));
> ```
> I think if we do not set the member dropStmt in macro ARCHIVE_OPTS here, when
> we
> specifying the option "--logical-replication-slots-only" and option "-c/--clean"
> together, the "-c/--clean" will not work.
> 
> I think that we could use the function pg_drop_replication_slot to set this
> member. Then, in the main function in the pg_dump.c file, we should add a check
> to prevent specifying option "--logical-replication-slots-only" and
> option "--if-exists" together.
> Or, we could simply add a check to prevent specifying option
> "--logical-replication-slots-only" and option "-c/--clean" together.
> What do you think?

I chose not to allow to combine with -c. Assuming that this option is used only
by the pg_upgrade, it is ensured that new node does not have any logical replication
slots. So the remove function is not needed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

Thank you for reviewing! PSA new version.

> 1. check_new_cluster
> 
> + if (user_opts.include_logical_slots)
> + {
> + get_logical_slot_infos(&new_cluster);
> + check_for_parameter_settings(&new_cluster);
> + }
> +
>   check_new_cluster_is_empty();
> ~
> 
> The code is OK, but maybe your reply/explanation (see [2] #2) saying
> get_logical_slot_infos() needs to be called before
> check_new_cluster_is_empty() would be good to have in a comment here?

Indeed, added.

> src/bin/pg_upgrade/info.c
> 
> 2. get_logical_slot_infos
> 
> + if (ntups)
> + slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
> + else
> + {
> + slotinfos = NULL;
> + goto cleanup;
> + }
> +
> + i_slotname = PQfnumber(res, "slot_name");
> + i_plugin = PQfnumber(res, "plugin");
> + i_twophase = PQfnumber(res, "two_phase");
> +
> + for (slotnum = 0; slotnum < ntups; slotnum++)
> + {
> + LogicalSlotInfo *curr = &slotinfos[num_slots++];
> +
> + curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
> + curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
> + curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
> + }
> +
> +cleanup:
> + PQfinish(conn);
> 
> IMO the goto/label coding is not warranted here - a simple if/else can
> do the same thing.

Yeah, I could simplify by if-statement. Additionally, some definitions of variables
are moved to the code block.

> 3. free_db_and_rel_infos, free_logical_slot_infos
> 
> static void
> free_db_and_rel_infos(DbInfoArr *db_arr)
> {
> int dbnum;
> 
> for (dbnum = 0; dbnum < db_arr->ndbs; dbnum++)
> {
> free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
> pg_free(db_arr->dbs[dbnum].db_name);
> }
> pg_free(db_arr->dbs);
> db_arr->dbs = NULL;
> db_arr->ndbs = 0;
> }
> 
> ~
> 
> In v12 now you removed the free_logical_slot_infos(). But isn't it
> better to still call free_logical_slot_infos() from the above
> free_db_and_rel_infos() still so the slot memory
> (dbinfo->slot_arr.slots) won't stay lying around?

The free_db_and_rel_infos() is called at restore phase, and slot_arr has malloc'd
members only when logical slots are defined on new_cluster. In this case the FATAL
error is occured in the checking phase, so there is no possibility to reach restore
phase.

> 4. get_logical_slot_infos, print_slot_infos
> 
> In another thread [1] I am posting some minor patch changes to the
> VERBOSE logging (changes to double-quotes and commas etc.). Please
> keep a watch on that thread because if gets pushed then this one will
> be impacted. e.g. your logging here ought also to include the same
> suggested double quotes.

I thought it would be pushed soon, so the suggestion was included.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

15 мая 2023 г., 10:39:57

Hi Kuroda-san.

I looked at the latest patch v13-0001. Here are some minor comments.

======
src/bin/pg_upgrade/info.c

1. get_logical_slot_infos_per_db

I noticed that the way this is coded, 'ntups' and 'num_slots' seems to
have exactly the same meaning. IMO you can simplify this by removing
'ntups'.

BEFORE
+ int ntups;
+ int num_slots = 0;

SUGGESTION
+ int num_slots;

~

BEFORE
+ ntups = PQntuples(res);
+
+ if (ntups)
+ {

SUGGESTION
+ num_slots = PQntuples(res);
+
+ if (num_slots)
+ {

~

BEFORE
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);

SUGGESTION
+ slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) *
num_slots);

~

BEFORE
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[num_slots++];

SUGGESTION
+ for (slotnum = 0; slotnum < ntups; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];

======

2. get_logical_slot_infos, print_slot_infos

> >
> > In another thread [1] I am posting some minor patch changes to the
> > VERBOSE logging (changes to double-quotes and commas etc.). Please
> > keep a watch on that thread because if gets pushed then this one will
> > be impacted. e.g. your logging here ought also to include the same
> > suggested double quotes.
>
> I thought it would be pushed soon, so the suggestion was included.

OK, but I think you have accidentally missed adding similar new double
quotes to all other VERBOSE logging in your patch.

e.g. see get_logical_slot_infos:
pg_log(PG_VERBOSE, "Database: %s", pDbInfo->db_name);

------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

16 мая 2023 г., 09:15:00

Dear Peter,

Thanks for reviewing! PSA new version patchset.

> 1. get_logical_slot_infos_per_db
> 
> I noticed that the way this is coded, 'ntups' and 'num_slots' seems to
> have exactly the same meaning. IMO you can simplify this by removing
> 'ntups'.
> 
> BEFORE
> + int ntups;
> + int num_slots = 0;
> 
> SUGGESTION
> + int num_slots;
> 
> ~
> 
> BEFORE
> + ntups = PQntuples(res);
> +
> + if (ntups)
> + {
> 
> SUGGESTION
> + num_slots = PQntuples(res);
> +
> + if (num_slots)
> + {
> 
> ~
> 
> BEFORE
> + slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) * ntups);
> 
> SUGGESTION
> + slotinfos = (LogicalSlotInfo *) pg_malloc(sizeof(LogicalSlotInfo) *
> num_slots);
> 
> ~
> 
> BEFORE
> + for (slotnum = 0; slotnum < ntups; slotnum++)
> + {
> + LogicalSlotInfo *curr = &slotinfos[num_slots++];
> 
> SUGGESTION
> + for (slotnum = 0; slotnum < ntups; slotnum++)
> + {
> + LogicalSlotInfo *curr = &slotinfos[slotnum];

Right, fixed.

> 2. get_logical_slot_infos, print_slot_infos
> 
> > >
> > > In another thread [1] I am posting some minor patch changes to the
> > > VERBOSE logging (changes to double-quotes and commas etc.). Please
> > > keep a watch on that thread because if gets pushed then this one will
> > > be impacted. e.g. your logging here ought also to include the same
> > > suggested double quotes.
> >
> > I thought it would be pushed soon, so the suggestion was included.
> 
> OK, but I think you have accidentally missed adding similar new double
> quotes to all other VERBOSE logging in your patch.
> 
> e.g. see get_logical_slot_infos:
> pg_log(PG_VER
BOSE, "Database: %s", pDbInfo->db_name);
> 

Oh, I missed it. Fixed. I grepped patches and could not find other lines
which should be double-quoted.

In addition, I ran pgindent again for 0001.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Wang,

Thank you for reviewing! PSA new version.

> For patches 0001
> 
> 1. The latest patch set fails to apply because the new commit (0245f8d) in HEAD.

I didn't notice that. Thanks, fixed.

> 2. In file pg_dump.h.
> ```
> +/*
> + * The LogicalReplicationSlotInfo struct is used to represent replication
> + * slots.
> + *
> + * XXX: add more attributes if needed
> + */
> +typedef struct _LogicalReplicationSlotInfo
> +{
> +    DumpableObject dobj;
> +    char       *plugin;
> +    char       *slottype;
> +    bool        twophase;
> +} LogicalReplicationSlotInfo;
> ```
> 
> Do we need the structure member "slottype"? It seems we do not use "slottype"
> because we only dump logical replication slot.

As you said, this attribute is not needed. This is a garbage of previous efforts.
Removed.

> For patch 0002
> 
> 3. In the function SaveSlotToPath
> ```
> -    /* and don't do anything if there's nothing to write */
> -    if (!was_dirty)
> +    /*
> +     * and don't do anything if there's nothing to write, unless it's this is
> +     * called for a logical slot during a shutdown checkpoint, as we want to
> +     * persist the confirmed_flush_lsn in that case, even if that's the only
> +     * modification.
> +     */
> +    if (!was_dirty && !is_shutdown && !SlotIsLogical(slot))
> ```
> It seems that the code isn't consistent with our expectation.
> If this is called for a physical slot during a shutdown checkpoint and there's
> nothing to write, I think it will also persist physical slots to disk.

You meant to say that we should not change handlings for physical case, right?

> For patch 0003
> 
> 4. In the function check_for_parameter_settings
> ```
> +    /* --include-logical-replication-slots can be used since PG    16. */
> +    if (GET_MAJOR_VERSION(new_cluster->major_version < 1600))
> +        return;
> ```
> It seems that there is a slight mistake (the input of GET_MAJOR_VERSION) in the
> if-condition:
> GET_MAJOR_VERSION(new_cluster->major_version < 1600)
> ->
> GET_MAJOR_VERSION(new_cluster->major_version) <= 1500
> 
> Please also check the similar if-conditions in the below two functions
> check_for_confirmed_flush_lsn (in 0003 patch)
> check_are_logical_slots_active (in 0004 patch)

Done. I grepped with GET_MAJOR_VERSION, and confirmed they were fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Vignesh,

Thank you for reviewing! PSA new version patch set.

> Few minor comments:
> 1) we could remove the variable slotname from the below code by using
> PQgetvalue directly in pg_log:
> +       for (i = 0; i < ntups; i++)
> +       {
> +               char       *slotname;
> +
> +               is_error = true;
> +
> +               slotname = PQgetvalue(res, i, i_slotname);
> +
> +               pg_log(PG_WARNING,
> +                          "\nWARNING: logical replication slot \"%s\"
> is not active",
> +                          slotname);
> +       }

Removed. Such codes were in two functions, and both of them were fixed.

> 2) This include "catalog/pg_control.h" should be after inclusion pg_collation.h
>  #include "catalog/pg_authid_d.h"
> +#include "catalog/pg_control.h"
>  #include "catalog/pg_collation.h"

Moved.

> 3) This spurious addition line change might not be required in this patch:
>  --- a/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
> +++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
> @@ -85,11 +85,39 @@ $old_node->safe_psql(
>  ]);
> 
>  my $result = $old_node->safe_psql('postgres',
> -       "SELECT count (*) FROM
> pg_logical_slot_get_changes('test_slot', NULL, NULL)"
> +       "SELECT count(*) FROM
> pg_logical_slot_peek_changes('test_slot', NULL, NULL)"
>  );
> +
>  is($result, qq(12), 'ensure WALs are not consumed yet');
>  $old_node->stop;

I removed the line.
In the first place, what I wanted to check here was that pg_upgrade failed because
WALs were not consumed. So if pg_logical_slot_get_changes() was called here, all
of WALs were consumed here and the subsequent command was sucseeded. This was not
happy for us and that's why changed to pg_logical_slot_peek_changes().
But after considering more, I thought that calling the function was not the mandatory
because no one needed the output.So removed.

> 4) This inclusion "#include "access/xlogrecord.h" is not required:
>  #include "postgres_fe.h"
> 
> +#include "access/xlogrecord.h"
> +#include "access/xlog_internal.h"
>  #include "catalog/pg_authid_d.h"

Removed.

> 5)"thepublisher's" should be "the publisher's"
>  When a live check is requested, there is a possibility of additional changes
> occurring, which may cause the current WAL position to exceed the
> confirmed_flush_lsn
> of the slot. As a result, we check the confirmed_flush_lsn of each logical slot
> instead. This is sufficient as all the WAL records will be sent during
> thepublisher's
> shutdown.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

Thank you for reviewing! PSA new version patchset.

> ======
> Commit message
> 
> 1.
> For pg_dump this commit includes a new option called
> "--logical-replication-slots-only".
> This option can be used to dump logical replication slots. When this option is
> specified, the slot_name, plugin, and two_phase parameters are extracted from
> pg_replication_slots. An SQL file is then generated which executes
> pg_create_logical_replication_slot() with the extracted parameters.
> 
> ~
> 
> This part doesn't do the actual execution, so maybe slightly reword this.
> 
> BEFORE
> An SQL file is then generated which executes
> pg_create_logical_replication_slot() with the extracted parameters.
> 
> SUGGESTION
> An SQL file that executes pg_create_logical_replication_slot() with
> the extracted parameters is generated.

Changed.

> 2.
> For pg_upgrade, when '--include-logical-replication-slots' is
> specified, it executes
> pg_dump with the new "--logical-replication-slots-only" option and
> restores from the
> dump. Note that we cannot dump replication slots at the same time as the schema
> dump because we need to separate the timing of restoring replication slots and
> other objects. Replication slots, in  particular, should not be restored before
> executing the pg_resetwal command because it will remove WALs that are
> required
> by the slots.
> 
> ~~~
> 
> Maybe "restores from the dump" can be described more?
> 
> BEFORE
> ...and restores from the dump.
> 
> SUGGESTION
> ...and restores the slots using the
> pg_create_logical_replication_slots() statements that the dump
> generated (see above).

Fixed.

> src/bin/pg_dump/pg_dump.c
> 
> 3. help
> 
> +
> + /*
> + * The option --logical-replication-slots-only is used only by pg_upgrade
> + * and should not be called by users, which is why it is not listed.
> + */
>   printf(_("  --no-comments                do not dump comments\n"));
> ~
> 
> /not listed./not exposed by the help./

Fixed.

> 4. getLogicalReplicationSlots
> 
> + /* Check whether we should dump or not */
> + if (fout->remoteVersion < 160000)
> + return;
> 
> PG16 is already in beta. I think this should now be changed to 170000, right?

That's right, fixed.

> src/bin/pg_upgrade/check.c
> 
> 5. check_new_cluster
> 
> + /*
> + * Do additional works if --include-logical-replication-slots is required.
> + * These must be done before check_new_cluster_is_empty() because the
> + * slot_arr attribute of the new_cluster will be checked in the function.
> + */
> 
> SUGGESTION (minor rewording/grammar)
> Do additional work if --include-logical-replication-slots was
> specified. This must be done before check_new_cluster_is_empty()
> because the slot_arr attribute of the new_cluster will be checked in
> that function.

Fixed.

> 6. check_new_cluster_is_empty
> 
> + /*
> + * If --include-logical-replication-slots is required, check the
> + * existence of slots.
> + */
> + if (user_opts.include_logical_slots)
> + {
> + LogicalSlotInfoArr *slot_arr = &new_cluster.dbarr.dbs[dbnum].slot_arr;
> +
> + /* if nslots > 0, report just first entry and exit */
> + if (slot_arr->nslots)
> + pg_fatal("New cluster database \"%s\" is not empty: found logical
> replication slot \"%s\"",
> + new_cluster.dbarr.dbs[dbnum].db_name,
> + slot_arr->slots[0].slotname);
> + }
> +
> 
> 6a.
> There are a number of places in this function using
> "new_cluster.dbarr.dbs[dbnum].XXX"
> 
> It is OK but maybe it would be tidier to up-front assign a local
> variable for this?
> 
> DbInfo *pDbInfo = &new_cluster.dbarr.dbs[dbnum];

Seems better, fixed.

> 6b.
> The above code adds an unnecessary blank line in the loop that was not
> there previously.

Removed.

> 7. check_for_parameter_settings
> 
> +/*
> + * Verify parameter settings for creating logical replication slots
> + */
> +static void
> +check_for_parameter_settings(ClusterInfo *new_cluster)
> 
> 7a.
> I felt this might have some missing words so it was meant to say:
> 
> SUGGESTION
> Verify the parameter settings necessary for creating logical replication slots.

Changed.

> 7b.
> Maybe you can give this function a better name because there is no
> hint in this generic name that it has anything to do with replication
> slots.

Renamed to check_for_logical_replication_slots(), how do you think?

> 8.
> + /* --include-logical-replication-slots can be used since PG16. */
> + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1500)
> + return;
> 
> PG16 is already in beta, so the version number (1500) and the comment
> mentioning PG16 are outdated aren't they?

Right, fixed.

> src/bin/pg_upgrade/info.c
> 
> 9.
>  static void print_rel_infos(RelInfoArr *rel_arr);
> -
> +static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
> 
> The removal of the existing blank line seems not a necessary part of this patch.

Added.

> 10. get_logical_slot_infos_per_db
> 
> + char query[QUERY_ALLOC];
> +
> + query[0] = '\0'; /* initialize query string to empty */
> +
> + snprintf(query, sizeof(query),
> + "SELECT slot_name, plugin, two_phase "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE database = current_database() AND temporary = false "
> + "AND wal_status IN ('reserved', 'extended');");
> 
> Does the initial assignment query[0] = '\0'; acheive anything? IIUC,
> the next statement is simply going to overwrite that anyway.

This was garbage of previous versions. Removed.

> 11. free_db_and_rel_infos
> 
> +
> + /*
> + * db_arr has an additional attribute, LogicalSlotInfoArr slot_arr,
> + * but there is no need to free it. It has a valid member only when
> + * the cluster had logical replication slots in the previous call.
> + * However, in this case, a FATAL error is thrown, and we cannot reach
> + * this point.
> + */
> 
> Maybe this comment can be reworded? For example, the meaning of "in
> the previous call" is not very clear. What previous call?

After considering more, I thought it should be more simpler. What I wanted to say
was that the slot_arr.slots did not have malloc'd memory. So I added Assert() for
the confirmation and changed comments. For that purpose pg_malloc0() is also
introduced in get_db_infos(). How do you think?

> src/bin/pg_upgrade/pg_upgrade.c
> 
> 12. main
> 
> + /*
> + * Create logical replication slots if requested.
> + *
> + * Note: This must be done after doing pg_resetwal command because the
> + * command will remove required WALs.
> + */
> + if (user_opts.include_logical_slots)
> + {
> + start_postmaster(&new_cluster, true);
> + create_logical_replication_slots();
> + stop_postmaster(false);
> + }
> 
> IMO "the command" is a bit vague. It might be better to be explicit
> and say "... because pg_resetwal would remove XXXXX..."

Changed.

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 13.
> +typedef struct
> +{
> + LogicalSlotInfo *slots;
> + int nslots;
> +} LogicalSlotInfoArr;
> +
> 
> I assume you mimicked the RelInfoArr struct, but IMO it makes more
> sense for the field 'nslots' to come before the 'slots'.

Yeah, I followed that, but no strong opinion. Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear hackers,

> Based on the above, we are considering that we delay the timing of shutdown for
> logical walsenders. The preliminary workflow is:
> 
> 1. When logical walsenders receives siginal from checkpointer, it consumes all
>    of WAL records, change its state into WALSNDSTATE_STOPPING, and stop
> doing
>    anything.
> 2. Then the checkpointer does the shutdown checkpoint
> 3. After that postmaster sends signal to walsenders, same as current
> implementation.
> 4. Finally logical walsenders process the shutdown checkpoint record and update
> the
>   confirmed_lsn after the acknowledgement from subscriber.
>   Note that logical walsenders don't have to send a shutdown checkpoint record
>   to subscriber but following keep_alive will help us to increment the
> confirmed_lsn.
> 5. All tasks are done, they exit.
> 
> This mechanism ensures that the confirmed_lsn of active slots is same as the
> current
> WAL location of old publisher, so that 0003 patch would become more simpler.
> We would not have to calculate the acceptable difference anymore.
> 
> One thing we must consider is that any WALs must not be generated while
> decoding
> the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
> SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
> it, so the change may be acceptable. Thought?

I've implemented the ideas from my previous proposal, PSA another patch set.
Patch 0001 introduces the state WALSNDSTATE_STOPPING to logical walsenders. The
workflow remains largely the same as described in my previous post, with the
following additions:

* A flag has been added to track whether all the WALs have been flushed. The
  logical walsender can only exit after the flag is set. This ensures that all
  WALs are flushed before the termination of the walsender.
* Cumulative statistics are now forcibly written before changing the state.
  While the previous involved reporting stats upon process exit, the current approach
  must report earlier due to the checkpointer's termination timing. See comments
  in CheckpointerMain() and atop pgstat_before_server_shutdown().
* At the end of processes, slots are now saved to disk.


Patch 0002 adds --include-logical-replication-slots option to pg_upgrade,
not changed from previous set.

Patch 0003 adds a check function, which becomes simpler. 
The previous version calculated the "acceptable" difference between confirmed_lsn
and the current WAL position. This was necessary because shutdown records could
not be sent to subscribers, creating a disparity in these values. However, this
approach had drawbacks, such as needing adjustments if record sizes changed.

Now, the record can be sent to subscribers, so the hacking is not needed anymore,
at least in the context of logical replication. The consistency is now maintained
by the logical walsenders, so slots created by the backend could not be.
We must consider what should be...

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

28 июля 2023 г., 14:59:06

On Fri, 21 Jul 2023 at 13:00, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> > Based on the above, we are considering that we delay the timing of shutdown for
> > logical walsenders. The preliminary workflow is:
> >
> > 1. When logical walsenders receives siginal from checkpointer, it consumes all
> >    of WAL records, change its state into WALSNDSTATE_STOPPING, and stop
> > doing
> >    anything.
> > 2. Then the checkpointer does the shutdown checkpoint
> > 3. After that postmaster sends signal to walsenders, same as current
> > implementation.
> > 4. Finally logical walsenders process the shutdown checkpoint record and update
> > the
> >   confirmed_lsn after the acknowledgement from subscriber.
> >   Note that logical walsenders don't have to send a shutdown checkpoint record
> >   to subscriber but following keep_alive will help us to increment the
> > confirmed_lsn.
> > 5. All tasks are done, they exit.
> >
> > This mechanism ensures that the confirmed_lsn of active slots is same as the
> > current
> > WAL location of old publisher, so that 0003 patch would become more simpler.
> > We would not have to calculate the acceptable difference anymore.
> >
> > One thing we must consider is that any WALs must not be generated while
> > decoding
> > the shutdown checkpoint record. It causes the PANIC. IIUC the record leads
> > SnapBuildSerializationPoint(), which just serializes snapbuild or restores from
> > it, so the change may be acceptable. Thought?
>
> I've implemented the ideas from my previous proposal, PSA another patch set.
> Patch 0001 introduces the state WALSNDSTATE_STOPPING to logical walsenders. The
> workflow remains largely the same as described in my previous post, with the
> following additions:
>
> * A flag has been added to track whether all the WALs have been flushed. The
>   logical walsender can only exit after the flag is set. This ensures that all
>   WALs are flushed before the termination of the walsender.
> * Cumulative statistics are now forcibly written before changing the state.
>   While the previous involved reporting stats upon process exit, the current approach
>   must report earlier due to the checkpointer's termination timing. See comments
>   in CheckpointerMain() and atop pgstat_before_server_shutdown().
> * At the end of processes, slots are now saved to disk.
>
>
> Patch 0002 adds --include-logical-replication-slots option to pg_upgrade,
> not changed from previous set.
>
> Patch 0003 adds a check function, which becomes simpler.
> The previous version calculated the "acceptable" difference between confirmed_lsn
> and the current WAL position. This was necessary because shutdown records could
> not be sent to subscribers, creating a disparity in these values. However, this
> approach had drawbacks, such as needing adjustments if record sizes changed.
>
> Now, the record can be sent to subscribers, so the hacking is not needed anymore,
> at least in the context of logical replication. The consistency is now maintained
> by the logical walsenders, so slots created by the backend could not be.
> We must consider what should be...
>
> How do you think?

Here is a patch which checks that there are no WAL records other than
CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
from [1].
Patch 0001 and 0002 is same as the patch posted by Kuroda-san, Patch
0003 exposes pg_get_wal_records_content to get the WAL records along
with the WAL record type between start and end lsn. pg_walinspect
contrib module already exposes a function for this requirement, I have
moved this functionality to be exposed from the backend. Patch 0004
has slight change in check function to check that there are no other
records other than CHECKPOINT_SHUTDOWN to be consumed. The attached
patch has the changes for the same.
Thoughts?

[1] - https://www.postgresql.org/message-id/CAA4eK1Kem-J5NM7GJCgyKP84pEN6RsG6JWo%3D6pSn1E%2BiexL1Fw%40mail.gmail.com

Regards,
Vignesh

Вложения

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

01 августа 2023 г., 12:39:01

On Fri, Jul 28, 2023 at 5:48 PM vignesh C <vignesh21@gmail.com> wrote:
>
> Here is a patch which checks that there are no WAL records other than
> CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
> from [1].
>

Few comments:
=============
1. Do we really need 0001 patch after the latest change proposed by
Vignesh in the 0004 patch?

2.
+ if (dopt.logical_slots_only)
+ {
+ if (!dopt.binary_upgrade)
+ pg_fatal("options --logical-replication-slots-only requires option
--binary-upgrade");
+
+ if (dopt.dataOnly)
+ pg_fatal("options --logical-replication-slots-only and
-a/--data-only cannot be used together");
+
+ if (dopt.schemaOnly)
+ pg_fatal("options --logical-replication-slots-only and
-s/--schema-only cannot be used together");

Can you please explain why the patch imposes these restrictions? I
guess the binary_upgrade is because you want this option to be used
for the upgrade. Do we want to avoid giving any other option with
logical_slots, if so, are the above checks sufficient and why?

3.
+ /*
+ * Get replication slots.
+ *
+ * XXX: Which information must be extracted from old node? Currently three
+ * attributes are extracted because they are used by
+ * pg_create_logical_replication_slot().
+ */
+ appendPQExpBufferStr(query,
+ "SELECT slot_name, plugin, two_phase "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE database = current_database() AND temporary = false "
+ "AND wal_status IN ('reserved', 'extended');");

Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
or WALAVAIL_UNRESERVED? I think the slots where wal status is
WALAVAIL_REMOVED, the corresponding slots are invalidated at some
point. I think such slots can't be used for decoding but these will be
dropped along with the subscription or when a user does it manually.
So, if we don't copy such slots after the upgrade then there could be
a problem in dropping the corresponding subscription. If we don't want
to copy over such slots then we need to provide instructions on what
users should do in such cases. OTOH, if we want to copy over such
slots then we need to find a way to invalidate such slots after copy.
Either way, this needs more analysis.

4.
+ /*
+ * Check that all logical replication slots have reached the current WAL
+ * position.
+ */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE (SELECT count(record_type) "
+ " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
pg_catalog.pg_current_wal_insert_lsn()) "
+ " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
+ "AND temporary = false AND wal_status IN ('reserved', 'extended');");

I think this can unnecessarily lead to reading a lot of WAL data if
the confirmed_flush_lsn for a slot is too much behind. Can we think of
improving this by passing the number of records to read which in this
case should be 1?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Jonathan S. Katz"

Дата:

02 августа 2023 г., 05:16:35

On 8/1/23 5:39 AM, Amit Kapila wrote:
> On Fri, Jul 28, 2023 at 5:48 PM vignesh C <vignesh21@gmail.com> wrote:
>>
>> Here is a patch which checks that there are no WAL records other than
>> CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
>> from [1].
>>
> 
> Few comments:
> =============

> 2.
> + if (dopt.logical_slots_only)
> + {
> + if (!dopt.binary_upgrade)
> + pg_fatal("options --logical-replication-slots-only requires option
> --binary-upgrade");
> +
> + if (dopt.dataOnly)
> + pg_fatal("options --logical-replication-slots-only and
> -a/--data-only cannot be used together");
> +
> + if (dopt.schemaOnly)
> + pg_fatal("options --logical-replication-slots-only and
> -s/--schema-only cannot be used together");
> 
> Can you please explain why the patch imposes these restrictions? I
> guess the binary_upgrade is because you want this option to be used
> for the upgrade. Do we want to avoid giving any other option with
> logical_slots, if so, are the above checks sufficient and why?

Can I take this a step further on the user interface and ask why the 
flag would be "--include-logical-replication-slots" vs. being enabled by 
default?

Are there reasons why we wouldn't enable this feature by default on 
pg_upgrade, and instead (if need be) have a flag that would be 
"--exclude-logical-replication-slots"? Right now, not having the ability 
to run pg_upgrade with logical replication slots enabled on the 
publisher is a a very big pain point for users, so I would strongly 
recommend against adding friction unless there is a very large challenge 
with such an implementation.

Thanks,

Jonathan

Вложения

OpenPGP_signature

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

02 августа 2023 г., 06:31:27

Dear Jonathan,

Thank you for reading the thread!

> Can I take this a step further on the user interface and ask why the
> flag would be "--include-logical-replication-slots" vs. being enabled by
> default?
> 
> Are there reasons why we wouldn't enable this feature by default on
> pg_upgrade, and instead (if need be) have a flag that would be
> "--exclude-logical-replication-slots"? Right now, not having the ability
> to run pg_upgrade with logical replication slots enabled on the
> publisher is a a very big pain point for users, so I would strongly
> recommend against adding friction unless there is a very large challenge
> with such an implementation.

The main reason was that there were no major complaints till now. This decision
followed the related discussion, for upgrading the subscriber [1]. As mentioned
there, current style might have more flexibility. Of course we could change that
if there are more opinions around here.
(I believe that this feature is useful for everyone, but changing the default may
affect others...)

As for the implementation, I did not check so deeply but there is no challenge.
We cannot change the style pg_dump option due to the pg_resetwal ordering issue[2],
but it option is not visible from users. I will check deeper when we want to do...

How do you think?

[1]: https://www.postgresql.org/message-id/CAA4eK1KD-hZ3syruxJA6fK-JtSBzL6etkwToPuTmVkrCvT6ASw%40mail.gmail.com
[2]:
https://www.postgresql.org/message-id/TYAPR01MB58668C61A3C6EE82AE436C07F539A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

02 августа 2023 г., 11:13:14

Dear Vignesh,

Thank you for making the PoC!

> Here is a patch which checks that there are no WAL records other than
> CHECKPOINT_SHUTDOWN WAL record to be consumed based on the discussion
> from [1].

Basically I agreed your approach. Thanks!

> Patch 0001 and 0002 is same as the patch posted by Kuroda-san, Patch
> 0003 exposes pg_get_wal_records_content to get the WAL records along
> with the WAL record type between start and end lsn. pg_walinspect
> contrib module already exposes a function for this requirement, I have
> moved this functionality to be exposed from the backend. Patch 0004
> has slight change in check function to check that there are no other
> records other than CHECKPOINT_SHUTDOWN to be consumed. The attached
> patch has the changes for the same.
> Thoughts?
> 
> [1] -
> https://www.postgresql.org/message-id/CAA4eK1Kem-J5NM7GJCgyKP84pEN6
> RsG6JWo%3D6pSn1E%2BiexL1Fw%40mail.gmail.com

Few comments:

* Per comment from Amit [1], I used pg_get_wal_record_info() instead of pg_get_wal_records_info().
This function extract a next available WAL record, which can avoid huge scan if
the confirmed_flush is much behind.
* According to cfbot and my analysis, the 0001 cannot pass the test on macOS.
 So I revived Julien's patch [2] as 0002 once. AFAIS the 0001 is not so welcomed.

Next patch will be available soon.

[1]: https://www.postgresql.org/message-id/CAA4eK1LWKkoyy-p-SAT0JTWa%3D6kXiMd%3Da6ZcArY9eU4a3g4TZg%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/20230414061248.vdsxz2febjo3re6h%40jrouhaud

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

02 августа 2023 г., 11:13:38

Dear Amit,

Thank you for giving comments! PSA new version patchset.

> 1. Do we really need 0001 patch after the latest change proposed by
> Vignesh in the 0004 patch?

I removed 0001 patch and revived old patch which serializes slots at shutdown.
This is because the problem which slots are not serialized to disk still remain [1]
and then confirmed_flush becomes behind, even if we implement the approach.

> 2.
> + if (dopt.logical_slots_only)
> + {
> + if (!dopt.binary_upgrade)
> + pg_fatal("options --logical-replication-slots-only requires option
> --binary-upgrade");
> +
> + if (dopt.dataOnly)
> + pg_fatal("options --logical-replication-slots-only and
> -a/--data-only cannot be used together");
> +
> + if (dopt.schemaOnly)
> + pg_fatal("options --logical-replication-slots-only and
> -s/--schema-only cannot be used together");
> 
> Can you please explain why the patch imposes these restrictions? I
> guess the binary_upgrade is because you want this option to be used
> for the upgrade. Do we want to avoid giving any other option with
> logical_slots, if so, are the above checks sufficient and why?

Regarding the --binary-upgrade, the motivation is same as you expected. I covered
up the --logical-replication-slots-only option from users, so it should not be
used not for upgrade. Additionaly, this option is not shown in help and document.

As for -{data|schema}-only options, I removed restrictions.
Firstly I set as excluded because it may be confused - as discussed at [2], slots
must be dumped after all the pg_resetwal is done and at that time all the definitions
are already dumped. to avoid duplicated definitions, we must ensure only slots are
written in the output file. I thought this requirement contradict descirptions of
these options (Dump only the A, not B).
But after considering more, I thought this might not be needed because it was not
opened to users - no one would be confused by using both them.
(Restriction for -c is also removed for the same motivation)

> 3.
> + /*
> + * Get replication slots.
> + *
> + * XXX: Which information must be extracted from old node? Currently three
> + * attributes are extracted because they are used by
> + * pg_create_logical_replication_slot().
> + */
> + appendPQExpBufferStr(query,
> + "SELECT slot_name, plugin, two_phase "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE database = current_database() AND temporary = false "
> + "AND wal_status IN ('reserved', 'extended');");
> 
> Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
> or WALAVAIL_UNRESERVED? I think the slots where wal status is
> WALAVAIL_REMOVED, the corresponding slots are invalidated at some
> point. I think such slots can't be used for decoding but these will be
> dropped along with the subscription or when a user does it manually.
> So, if we don't copy such slots after the upgrade then there could be
> a problem in dropping the corresponding subscription. If we don't want
> to copy over such slots then we need to provide instructions on what
> users should do in such cases. OTOH, if we want to copy over such
> slots then we need to find a way to invalidate such slots after copy.
> Either way, this needs more analysis.

I considered again here. At least WALAVAIL_UNRESERVED should be supported because
the slot is still usable. It can return reserved or extended.

As for WALAVAIL_REMOVED, I don't think it should be so that I added a description
to the document.

This feature re-create slots which have same name/plugins as old ones, not replicate
its state. So if we copy them as-is slots become usable again. If subscribers refer
the slot and then connect again at that time, changes between 'WALAVAIL_REMOVED'
may be lost.

Based on above slots must be copied as WALAVAIL_REMOVED, but as you said, we do
not have a way to control that. the status is calculated by using restart_lsn,
but there are no function to modify directly. 

One approach is adding an SQL funciton which set restart_lsn to aritrary value
(or 0/0, invalid), but it seems dangerous.

> 4.
> + /*
> + * Check that all logical replication slots have reached the current WAL
> + * position.
> + */
> + res = executeQueryOrDie(conn,
> + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> + "WHERE (SELECT count(record_type) "
> + " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
> pg_catalog.pg_current_wal_insert_lsn()) "
> + " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
> + "AND temporary = false AND wal_status IN ('reserved', 'extended');");
> 
> I think this can unnecessarily lead to reading a lot of WAL data if
> the confirmed_flush_lsn for a slot is too much behind. Can we think of
> improving this by passing the number of records to read which in this
> case should be 1?

I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
move the functionality to core.

But this function raise an ERROR when there is no valid record after the specified
lsn. This means that the pg_upgrade fails if logical slots has caught up the current
WAL location. IIUC DBA must do following steps:

1. shutdown old publisher
2. disable the subscription once <- this is mandatory, otherwise the walsender may
   send the record during the upgrade and confirmed_lsn may point the SHUTDOWN_CHECKPOINT
3. do pg_upgrade  <- pg_get_wal_record_content() may raise an ERROR if 2. was skipped
4. change the connection string of subscription
5. enable the subscription again

If we think this is not robust, we must implement similar function which does not raise ERROR instead.
How do you think?

[1]: https://www.postgresql.org/message-id/20230414061248.vdsxz2febjo3re6h%40jrouhaud
[2]: https://www.postgresql.org/message-id/CAA4eK1KD-hZ3syruxJA6fK-JtSBzL6etkwToPuTmVkrCvT6ASw@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

03 августа 2023 г., 09:38:31

On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thank you for giving comments! PSA new version patchset.
>
> > 3.
> > + /*
> > + * Get replication slots.
> > + *
> > + * XXX: Which information must be extracted from old node? Currently three
> > + * attributes are extracted because they are used by
> > + * pg_create_logical_replication_slot().
> > + */
> > + appendPQExpBufferStr(query,
> > + "SELECT slot_name, plugin, two_phase "
> > + "FROM pg_catalog.pg_replication_slots "
> > + "WHERE database = current_database() AND temporary = false "
> > + "AND wal_status IN ('reserved', 'extended');");
> >
> > Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
> > or WALAVAIL_UNRESERVED? I think the slots where wal status is
> > WALAVAIL_REMOVED, the corresponding slots are invalidated at some
> > point. I think such slots can't be used for decoding but these will be
> > dropped along with the subscription or when a user does it manually.
> > So, if we don't copy such slots after the upgrade then there could be
> > a problem in dropping the corresponding subscription. If we don't want
> > to copy over such slots then we need to provide instructions on what
> > users should do in such cases. OTOH, if we want to copy over such
> > slots then we need to find a way to invalidate such slots after copy.
> > Either way, this needs more analysis.
>
> I considered again here. At least WALAVAIL_UNRESERVED should be supported because
> the slot is still usable. It can return reserved or extended.
>
> As for WALAVAIL_REMOVED, I don't think it should be so that I added a description
> to the document.
>
> This feature re-create slots which have same name/plugins as old ones, not replicate
> its state. So if we copy them as-is slots become usable again. If subscribers refer
> the slot and then connect again at that time, changes between 'WALAVAIL_REMOVED'
> may be lost.
>
> Based on above slots must be copied as WALAVAIL_REMOVED, but as you said, we do
> not have a way to control that. the status is calculated by using restart_lsn,
> but there are no function to modify directly.
>
> One approach is adding an SQL funciton which set restart_lsn to aritrary value
> (or 0/0, invalid), but it seems dangerous.
>

I see your point related to WALAVAIL_REMOVED status of the slot but
did you test the scenario I have explained in my comment? Basically, I
want to know whether it can impact the user in some way. So, please
check whether the corresponding subscriptions will be allowed to drop.
You can test it both before and after the upgrade.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

03 августа 2023 г., 12:28:33

Dear Amit,

> I see your point related to WALAVAIL_REMOVED status of the slot but
> did you test the scenario I have explained in my comment? Basically, I
> want to know whether it can impact the user in some way. So, please
> check whether the corresponding subscriptions will be allowed to drop.
> You can test it both before and after the upgrade.

Yeah, this is a real issue. I have tested and confirmed the expected things.
Even if the status of the slot is 'lost', it may be needed for dropping
subscriptions properly.

* before upgrading, the subscription which refers the lost slot could be dropped
* after upgrading, the subscription could not be dropped as-is.
users must ALTER SUBSCRIPTION sub SET (slot_name = NONE);

Followings are the stepped what I did:

## Setup

1. constructed a logical replication system
2. disabled the subscriber once
3. consumed many WALs so that the status of slot became 'lost'

```
publisher=# SELECT slot_name, wal_status FROM pg_replication_slots ;
slot_name | wal_status
-----------+------------
sub | lost
(1 row)
```

# testcase a - try to drop sub. before upgrading

a-1. enabled the subscriber again.
At that time following messages are shown on subscriber log:
```
ERROR: could not start WAL streaming: ERROR: can no longer get changes from replication slot "sub"
DETAIL: This slot has been invalidated because it exceeded the maximum reserved size.
```

a-2. did DROP SUBSCRIPTION ...
a-3. succeeded.

```
subscriber=# DROP SUBSCRIPTION sub;
NOTICE: dropped replication slot "sub" on publisher
DROP SUBSCRIPTION
```

# testcase b - try to drop sub. after upgrading

b-1. did pg_upgrade command
b-2. enabled the subscriber. From that point an apply worker connected to new node...
b-3. did DROP SUBSCRIPTION ...
b-4. failed with the message:

```
subscriber=# DROP SUBSCRIPTION sub;
ERROR: could not drop replication slot "sub" on publisher: ERROR: replication slot "sub" does not exist
```

The workaround was to disassociate the slot, which was written in the document.

```
subscriber =# ALTER SUBSCRIPTION sub DISABLE;
ALTER SUBSCRIPTION
subscriber =# ALTER SUBSCRIPTION sub SET (slot_name = NONE);
ALTER SUBSCRIPTION
subscriber =# DROP SUBSCRIPTION sub;
DROP SUBSCRIPTION
```

PSA the script for emulating above tests.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

test_0803.sh

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

03 августа 2023 г., 13:26:58

On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thank you for giving comments! PSA new version patchset.
>
> > 1. Do we really need 0001 patch after the latest change proposed by
> > Vignesh in the 0004 patch?
>
> I removed 0001 patch and revived old patch which serializes slots at shutdown.
> This is because the problem which slots are not serialized to disk still remain [1]
> and then confirmed_flush becomes behind, even if we implement the approach.
>

So, IIUC, you are talking about a patch with the below commit message.
[PATCH v18 2/4] Always persist to disk logical slots during a
 shutdown checkpoint.

It's entirely possible for a logical slot to have a confirmed_flush_lsn higher
than the last value saved on disk while not being marked as dirty.  It's
currently not a problem to lose that value during a clean shutdown / restart
cycle, but a later patch adding support for pg_upgrade of publications and
logical slots will rely on that value being properly persisted to disk.


As per this commit message, this patch should be numbered as 1 but you
have placed it as 2 after the main upgrade patch?


> > 2.
> > + if (dopt.logical_slots_only)
> > + {
> > + if (!dopt.binary_upgrade)
> > + pg_fatal("options --logical-replication-slots-only requires option
> > --binary-upgrade");
> > +
> > + if (dopt.dataOnly)
> > + pg_fatal("options --logical-replication-slots-only and
> > -a/--data-only cannot be used together");
> > +
> > + if (dopt.schemaOnly)
> > + pg_fatal("options --logical-replication-slots-only and
> > -s/--schema-only cannot be used together");
> >
> > Can you please explain why the patch imposes these restrictions? I
> > guess the binary_upgrade is because you want this option to be used
> > for the upgrade. Do we want to avoid giving any other option with
> > logical_slots, if so, are the above checks sufficient and why?
>
> Regarding the --binary-upgrade, the motivation is same as you expected. I covered
> up the --logical-replication-slots-only option from users, so it should not be
> used not for upgrade. Additionaly, this option is not shown in help and document.
>
> As for -{data|schema}-only options, I removed restrictions.
> Firstly I set as excluded because it may be confused - as discussed at [2], slots
> must be dumped after all the pg_resetwal is done and at that time all the definitions
> are already dumped. to avoid duplicated definitions, we must ensure only slots are
> written in the output file. I thought this requirement contradict descirptions of
> these options (Dump only the A, not B).
> But after considering more, I thought this might not be needed because it was not
> opened to users - no one would be confused by using both them.
> (Restriction for -c is also removed for the same motivation)
>

I see inconsistent behavior here with the patch. If I use "pg_dump.exe
--schema-only --logical-replication-slots-only --binary-upgrade
postgres" then I get only a dump of slots without any schema. When I
use "pg_dump.exe --data-only --logical-replication-slots-only
--binary-upgrade postgres" then neither table data nor slots. When I
use "pg_dump.exe --create --logical-replication-slots-only
--binary-upgrade postgres" then it returns the error "pg_dump: error:
role with OID 10 does not exist".

Now, I tried using --binary-upgrade with some other option like
"pg_dump.exe --create --binary-upgrade postgres" and then I got a dump
with all required objects with support for binary-upgrade.

I think your thought here is that this new option won't be usable
directly with pg_dump but we should study whether we allow to support
other options with --binary-upgrade for in-place upgrade utilities
other than pg_upgrade.

>
> > 4.
> > + /*
> > + * Check that all logical replication slots have reached the current WAL
> > + * position.
> > + */
> > + res = executeQueryOrDie(conn,
> > + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> > + "WHERE (SELECT count(record_type) "
> > + " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
> > pg_catalog.pg_current_wal_insert_lsn()) "
> > + " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
> > + "AND temporary = false AND wal_status IN ('reserved', 'extended');");
> >
> > I think this can unnecessarily lead to reading a lot of WAL data if
> > the confirmed_flush_lsn for a slot is too much behind. Can we think of
> > improving this by passing the number of records to read which in this
> > case should be 1?
>
> I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
> move the functionality to core.
>

But I don't see how it addresses my concern about reading too many
records. If the confirmed_flush_lsn is too much behind, it will also
try to read all the remaining WAL for such slots.

> But this function raise an ERROR when there is no valid record after the specified
> lsn. This means that the pg_upgrade fails if logical slots has caught up the current
> WAL location. IIUC DBA must do following steps:
>
> 1. shutdown old publisher
> 2. disable the subscription once <- this is mandatory, otherwise the walsender may
>    send the record during the upgrade and confirmed_lsn may point the SHUTDOWN_CHECKPOINT
> 3. do pg_upgrade  <- pg_get_wal_record_content() may raise an ERROR if 2. was skipped
>

But we have already seen that we write shutdown_checkpoint record only
after logical walsender is shut down. So, how above is possible?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

04 августа 2023 г., 13:59:45

On Wed, Aug 2, 2023 at 1:43 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > 3.
> > + /*
> > + * Get replication slots.
> > + *
> > + * XXX: Which information must be extracted from old node? Currently three
> > + * attributes are extracted because they are used by
> > + * pg_create_logical_replication_slot().
> > + */
> > + appendPQExpBufferStr(query,
> > + "SELECT slot_name, plugin, two_phase "
> > + "FROM pg_catalog.pg_replication_slots "
> > + "WHERE database = current_database() AND temporary = false "
> > + "AND wal_status IN ('reserved', 'extended');");
> >
> > Why are we ignoring the slots that have wal status as WALAVAIL_REMOVED
> > or WALAVAIL_UNRESERVED? I think the slots where wal status is
> > WALAVAIL_REMOVED, the corresponding slots are invalidated at some
> > point. I think such slots can't be used for decoding but these will be
> > dropped along with the subscription or when a user does it manually.
> > So, if we don't copy such slots after the upgrade then there could be
> > a problem in dropping the corresponding subscription. If we don't want
> > to copy over such slots then we need to provide instructions on what
> > users should do in such cases. OTOH, if we want to copy over such
> > slots then we need to find a way to invalidate such slots after copy.
> > Either way, this needs more analysis.
>
> I considered again here. At least WALAVAIL_UNRESERVED should be supported because
> the slot is still usable. It can return reserved or extended.
>
> As for WALAVAIL_REMOVED, I don't think it should be so that I added a description
> to the document.
>
> This feature re-create slots which have same name/plugins as old ones, not replicate
> its state. So if we copy them as-is slots become usable again. If subscribers refer
> the slot and then connect again at that time, changes between 'WALAVAIL_REMOVED'
> may be lost.
>
> Based on above slots must be copied as WALAVAIL_REMOVED, but as you said, we do
> not have a way to control that. the status is calculated by using restart_lsn,
> but there are no function to modify directly.
>
> One approach is adding an SQL funciton which set restart_lsn to aritrary value
> (or 0/0, invalid), but it seems dangerous.
>

So, we have three options here (a) As you have done in the patch,
document this limitation and request user to perform some manual steps
to drop the subscription; (b) don't allow upgrade to proceed if there
are invalid slots in the old cluster; (c) provide a new function like
pg_copy_logical_replication_slot_contents() where we copy the required
contents like invalid status(ReplicationSlotInvalidationCause), etc.

Personally, I would prefer (b) because it will minimize the steps
required to perform by the user after the upgrade and looks cleaner
solution.

Thoughts?

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

04 августа 2023 г., 15:54:51

Dear Amit,

> So, we have three options here (a) As you have done in the patch,
> document this limitation and request user to perform some manual steps
> to drop the subscription; (b) don't allow upgrade to proceed if there
> are invalid slots in the old cluster; (c) provide a new function like
> pg_copy_logical_replication_slot_contents() where we copy the required
> contents like invalid status(ReplicationSlotInvalidationCause), etc.
> 
> Personally, I would prefer (b) because it will minimize the steps
> required to perform by the user after the upgrade and looks cleaner
> solution.
> 
> Thoughts?

Thanks for suggestion. I agreed (b) was better because it did not endanger users
for data lost. I implemented locally and worked well, so I'm planning to adopt
the idea in next version, if no objections.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

06 августа 2023 г., 15:31:36

On Wed, Aug 2, 2023 at 5:13 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > 4.
> > + /*
> > + * Check that all logical replication slots have reached the current WAL
> > + * position.
> > + */
> > + res = executeQueryOrDie(conn,
> > + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> > + "WHERE (SELECT count(record_type) "
> > + " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
> > pg_catalog.pg_current_wal_insert_lsn()) "
> > + " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
> > + "AND temporary = false AND wal_status IN ('reserved', 'extended');");
> >
> > I think this can unnecessarily lead to reading a lot of WAL data if
> > the confirmed_flush_lsn for a slot is too much behind. Can we think of
> > improving this by passing the number of records to read which in this
> > case should be 1?
>
> I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
> move the functionality to core.

IIUC the above query checks if the WAL record written at the slot's
confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
this WAL record is the latest record. Therefore, I think it's quite
possible that slot's confirmed_flush_lsn points to previous
CHECKPOINT_SHUTDOWN, for example, in cases where the subscription was
disabled after the publisher shut down and then some changes are made
on the publisher. We might want to add that check too but it would not
work. Because some WAL records could be written (e.g., by autovacuums)
during pg_upgrade before checking the slot's confirmed_flush_lsn.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 августа 2023 г., 06:54:02

On Sun, Aug 6, 2023 at 6:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Aug 2, 2023 at 5:13 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > 4.
> > > + /*
> > > + * Check that all logical replication slots have reached the current WAL
> > > + * position.
> > > + */
> > > + res = executeQueryOrDie(conn,
> > > + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> > > + "WHERE (SELECT count(record_type) "
> > > + " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
> > > pg_catalog.pg_current_wal_insert_lsn()) "
> > > + " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
> > > + "AND temporary = false AND wal_status IN ('reserved', 'extended');");
> > >
> > > I think this can unnecessarily lead to reading a lot of WAL data if
> > > the confirmed_flush_lsn for a slot is too much behind. Can we think of
> > > improving this by passing the number of records to read which in this
> > > case should be 1?
> >
> > I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
> > move the functionality to core.
>
> IIUC the above query checks if the WAL record written at the slot's
> confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
> this WAL record is the latest record.
>

Yeah, I also think there should be some way to ensure this. How about
passing the number of records to read to this API? Actually, that will
address my other concern as well where the current API can lead to
reading an unbounded number of records if the confirmed_flush_lsn
location is far behind the CHECKPOINT_SHUTDOWN. Do you have any better
ideas to address it?

> Therefore, I think it's quite
> possible that slot's confirmed_flush_lsn points to previous
> CHECKPOINT_SHUTDOWN, for example, in cases where the subscription was
> disabled after the publisher shut down and then some changes are made
> on the publisher. We might want to add that check too but it would not
> work. Because some WAL records could be written (e.g., by autovacuums)
> during pg_upgrade before checking the slot's confirmed_flush_lsn.
>

I think autovacuum is not enabled during the upgrade. See comment "Use
-b to disable autovacuum." in start_postmaster(). However, I am not
sure if there can't be any additional WAL from checkpointer or
bgwriter. Checkpointer has a code that ensures that if there is no
important WAL activity then it would be skipped. Similarly, bgwriter
also doesn't LOG xl_running_xacts unless there is an important
activity. I feel if there is a chance of any WAL activity during the
upgrade, we need to either change the check to ensure such WAL records
are expected or document the same in some way.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 августа 2023 г., 08:30:13

On Wed, Aug 2, 2023 at 7:46 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
>
> Can I take this a step further on the user interface and ask why the
> flag would be "--include-logical-replication-slots" vs. being enabled by
> default?
>
> Are there reasons why we wouldn't enable this feature by default on
> pg_upgrade, and instead (if need be) have a flag that would be
> "--exclude-logical-replication-slots"? Right now, not having the ability
> to run pg_upgrade with logical replication slots enabled on the
> publisher is a a very big pain point for users, so I would strongly
> recommend against adding friction unless there is a very large challenge
> with such an implementation.
>

Thanks for acknowledging the need/importance of this feature. I also
don't see a need to have such a flag for pg_upgrade. The only reason
why one might want to exclude slots is that they are not up to date
w.r.t WAL being consumed. For example, one has not consumed all the
WAL from manually created slots or say some subscription has been
disabled before shutdown. I guess in those cases we should give an
error to the user and ask to remove such slots before the upgrade
because anyway, those won't be usable after the upgrade.

Having said that, I think we need a flag for pg_dump to dump the slots.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Julien Rouhaud

Дата:

07 августа 2023 г., 08:59:31

On Mon, Aug 07, 2023 at 09:24:02AM +0530, Amit Kapila wrote:
>
> I think autovacuum is not enabled during the upgrade. See comment "Use
> -b to disable autovacuum." in start_postmaster(). However, I am not
> sure if there can't be any additional WAL from checkpointer or
> bgwriter. Checkpointer has a code that ensures that if there is no
> important WAL activity then it would be skipped. Similarly, bgwriter
> also doesn't LOG xl_running_xacts unless there is an important
> activity. I feel if there is a chance of any WAL activity during the
> upgrade, we need to either change the check to ensure such WAL records
> are expected or document the same in some way.

Unless I'm missing something I don't see what prevents something to connect
using the replication protocol and issue any query or even create new
replication slots?

Note also that as complained a few years ago nothing prevents a bgworker from
spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
multixid are assigned.  If publications are preserved wouldn't it mean that
such bgworkers could also lead to data loss?

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 августа 2023 г., 10:12:33

On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Aug 07, 2023 at 09:24:02AM +0530, Amit Kapila wrote:
> >
> > I think autovacuum is not enabled during the upgrade. See comment "Use
> > -b to disable autovacuum." in start_postmaster(). However, I am not
> > sure if there can't be any additional WAL from checkpointer or
> > bgwriter. Checkpointer has a code that ensures that if there is no
> > important WAL activity then it would be skipped. Similarly, bgwriter
> > also doesn't LOG xl_running_xacts unless there is an important
> > activity. I feel if there is a chance of any WAL activity during the
> > upgrade, we need to either change the check to ensure such WAL records
> > are expected or document the same in some way.
>
> Unless I'm missing something I don't see what prevents something to connect
> using the replication protocol and issue any query or even create new
> replication slots?
>

I think the point is that if we have any slots where we have not
consumed the pending WAL (other than the expected like
SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
won't proceed and we will request user to remove such slots or ensure
that WAL is consumed by slots. So, I think in the case you mentioned,
the upgrade won't succeed.

> Note also that as complained a few years ago nothing prevents a bgworker from
> spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
> multixid are assigned.  If publications are preserved wouldn't it mean that
> such bgworkers could also lead to data loss?
>

Is it because such workers would write some WAL which slots may not
process? If so, I think it is equally dangerous as other problems that
can arise due to such a worker. Do you think of any special handling
here?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Julien Rouhaud

Дата:

07 августа 2023 г., 10:36:17

On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:
> On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Unless I'm missing something I don't see what prevents something to connect
> > using the replication protocol and issue any query or even create new
> > replication slots?
> >
>
> I think the point is that if we have any slots where we have not
> consumed the pending WAL (other than the expected like
> SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
> won't proceed and we will request user to remove such slots or ensure
> that WAL is consumed by slots. So, I think in the case you mentioned,
> the upgrade won't succeed.

What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?

> > Note also that as complained a few years ago nothing prevents a bgworker from
> > spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
> > multixid are assigned.  If publications are preserved wouldn't it mean that
> > such bgworkers could also lead to data loss?
> >
>
> Is it because such workers would write some WAL which slots may not
> process? If so, I think it is equally dangerous as other problems that
> can arise due to such a worker. Do you think of any special handling
> here?

Yes, and there were already multiple reports of multixact corruption due to
bgworker activity during pg_upgrade (see
https://www.postgresql.org/message-id/20210121152357.s6eflhqyh4g5e6dv@dalibo.com
for instance).  I think we should once and for all fix this whole class of
problem one way or another.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

07 августа 2023 г., 11:31:50

On Mon, Aug 7, 2023 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Aug 6, 2023 at 6:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Aug 2, 2023 at 5:13 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > > 4.
> > > > + /*
> > > > + * Check that all logical replication slots have reached the current WAL
> > > > + * position.
> > > > + */
> > > > + res = executeQueryOrDie(conn,
> > > > + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> > > > + "WHERE (SELECT count(record_type) "
> > > > + " FROM pg_catalog.pg_get_wal_records_content(confirmed_flush_lsn,
> > > > pg_catalog.pg_current_wal_insert_lsn()) "
> > > > + " WHERE record_type != 'CHECKPOINT_SHUTDOWN') <> 0 "
> > > > + "AND temporary = false AND wal_status IN ('reserved', 'extended');");
> > > >
> > > > I think this can unnecessarily lead to reading a lot of WAL data if
> > > > the confirmed_flush_lsn for a slot is too much behind. Can we think of
> > > > improving this by passing the number of records to read which in this
> > > > case should be 1?
> > >
> > > I checked and pg_wal_record_info() seemed to be used for the purpose. I tried to
> > > move the functionality to core.
> >
> > IIUC the above query checks if the WAL record written at the slot's
> > confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
> > this WAL record is the latest record.
> >
>
> Yeah, I also think there should be some way to ensure this. How about
> passing the number of records to read to this API? Actually, that will
> address my other concern as well where the current API can lead to
> reading an unbounded number of records if the confirmed_flush_lsn
> location is far behind the CHECKPOINT_SHUTDOWN. Do you have any better
> ideas to address it?

It makes sense to me to limit the number of WAL records to read. But
as I mentioned below, if there is a chance of any WAL activity during
the upgrade, I'm not sure what limit to set.

>
> > Therefore, I think it's quite
> > possible that slot's confirmed_flush_lsn points to previous
> > CHECKPOINT_SHUTDOWN, for example, in cases where the subscription was
> > disabled after the publisher shut down and then some changes are made
> > on the publisher. We might want to add that check too but it would not
> > work. Because some WAL records could be written (e.g., by autovacuums)
> > during pg_upgrade before checking the slot's confirmed_flush_lsn.
> >
>
> I think autovacuum is not enabled during the upgrade. See comment "Use
> -b to disable autovacuum." in start_postmaster().

Right, thanks.

> However, I am not
> sure if there can't be any additional WAL from checkpointer or
> bgwriter. Checkpointer has a code that ensures that if there is no
> important WAL activity then it would be skipped. Similarly, bgwriter
> also doesn't LOG xl_running_xacts unless there is an important
> activity.

WAL records for hint bit updates could be generated even in upgrading mode?

> I feel if there is a chance of any WAL activity during the
> upgrade, we need to either change the check to ensure such WAL records
> are expected or document the same in some way.

Yes, but how does it work with the above idea of limiting the number
of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
the upgrade mode, we cannot predict how many such records are
generated after the latest CHECKPOINT_SHUTDOWN.

I'm not really sure we should always perform the slot's
confirmed_flush_lsn check by default in the first place. With this
check, the upgrade won't be able to proceed if there is any logical
slot that is not used by logical replication (or something streaming
the changes using walsender), right? For example, if a user uses a
program that periodically consumes the changes from the logical slot,
the slot would not be able to pass the check even if the user executed
pg_logical_slot_get_changes() just before shutdown. The backend
process who consumes the changes is always terminated before the
shutdown checkpoint. On the other hand, I think there are cases where
the user can ensure that no meaningful WAL records are generated after
the last pg_logical_slot_get_changes(). I'm concerned that this check
might make upgrading such cases cumbersome unnecessarily.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 августа 2023 г., 12:02:32

On Mon, Aug 7, 2023 at 2:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Aug 7, 2023 at 12:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Aug 6, 2023 at 6:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > IIUC the above query checks if the WAL record written at the slot's
> > > confirmed_flush_lsn is a CHECKPOINT_SHUTDOWN, but there is no check if
> > > this WAL record is the latest record.
> > >
> >
> > Yeah, I also think there should be some way to ensure this. How about
> > passing the number of records to read to this API? Actually, that will
> > address my other concern as well where the current API can lead to
> > reading an unbounded number of records if the confirmed_flush_lsn
> > location is far behind the CHECKPOINT_SHUTDOWN. Do you have any better
> > ideas to address it?
>
> It makes sense to me to limit the number of WAL records to read. But
> as I mentioned below, if there is a chance of any WAL activity during
> the upgrade, I'm not sure what limit to set.
>

In that case, we won't be able to pass the number of records. We need
to check based on the type of records.

>
> > However, I am not
> > sure if there can't be any additional WAL from checkpointer or
> > bgwriter. Checkpointer has a code that ensures that if there is no
> > important WAL activity then it would be skipped. Similarly, bgwriter
> > also doesn't LOG xl_running_xacts unless there is an important
> > activity.
>
> WAL records for hint bit updates could be generated even in upgrading mode?
>

Do you mean these records can be generated during reading catalog tables?

> > I feel if there is a chance of any WAL activity during the
> > upgrade, we need to either change the check to ensure such WAL records
> > are expected or document the same in some way.
>
> Yes, but how does it work with the above idea of limiting the number
> of WAL records to read? If XLOG_FPI_FOR_HINT can still be generated in
> the upgrade mode, we cannot predict how many such records are
> generated after the latest CHECKPOINT_SHUTDOWN.
>

Right, as said earlier, in that case, we need to rely on the type of records.

> I'm not really sure we should always perform the slot's
> confirmed_flush_lsn check by default in the first place. With this
> check, the upgrade won't be able to proceed if there is any logical
> slot that is not used by logical replication (or something streaming
> the changes using walsender), right? For example, if a user uses a
> program that periodically consumes the changes from the logical slot,
> the slot would not be able to pass the check even if the user executed
> pg_logical_slot_get_changes() just before shutdown. The backend
> process who consumes the changes is always terminated before the
> shutdown checkpoint. On the other hand, I think there are cases where
> the user can ensure that no meaningful WAL records are generated after
> the last pg_logical_slot_get_changes(). I'm concerned that this check
> might make upgrading such cases cumbersome unnecessarily.
>

You are right and I have mentioned the same case today in my response
to Jonathan but do you have better ideas to deal with such slots than
to give an ERROR?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 августа 2023 г., 13:16:13

On Mon, Aug 7, 2023 at 1:06 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:
> > On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > Unless I'm missing something I don't see what prevents something to connect
> > > using the replication protocol and issue any query or even create new
> > > replication slots?
> > >
> >
> > I think the point is that if we have any slots where we have not
> > consumed the pending WAL (other than the expected like
> > SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
> > won't proceed and we will request user to remove such slots or ensure
> > that WAL is consumed by slots. So, I think in the case you mentioned,
> > the upgrade won't succeed.
>
> What if new slots are added while the old instance is started in the middle of
> pg_upgrade, *after* the various checks are done?
>

They won't be copied but I think that won't be any different than
other objects like tables. Anyway, I have another idea which is to not
allow creating slots during binary upgrade unless one specifically
requests it by having an API like binary_upgrade_allow_slot_create()
similar to existing APIs binary_upgrade_*.

> > > Note also that as complained a few years ago nothing prevents a bgworker from
> > > spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
> > > multixid are assigned.  If publications are preserved wouldn't it mean that
> > > such bgworkers could also lead to data loss?
> > >
> >
> > Is it because such workers would write some WAL which slots may not
> > process? If so, I think it is equally dangerous as other problems that
> > can arise due to such a worker. Do you think of any special handling
> > here?
>
> Yes, and there were already multiple reports of multixact corruption due to
> bgworker activity during pg_upgrade (see
> https://www.postgresql.org/message-id/20210121152357.s6eflhqyh4g5e6dv@dalibo.com
> for instance).  I think we should once and for all fix this whole class of
> problem one way or another.
>

I don't object to doing something like we discussed in the thread you
linked but don't see the link with this work. Surely, the extra
WAL/XIDs generated during the upgrade will cause data inconsistency
which is no different after this patch.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

07 августа 2023 г., 13:53:19

Dear Amit, Julien,

> > > >
> > > > Unless I'm missing something I don't see what prevents something to
> connect
> > > > using the replication protocol and issue any query or even create new
> > > > replication slots?
> > > >
> > >
> > > I think the point is that if we have any slots where we have not
> > > consumed the pending WAL (other than the expected like
> > > SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
> > > won't proceed and we will request user to remove such slots or ensure
> > > that WAL is consumed by slots. So, I think in the case you mentioned,
> > > the upgrade won't succeed.
> >
> > What if new slots are added while the old instance is started in the middle of
> > pg_upgrade, *after* the various checks are done?
> >
> 
> They won't be copied but I think that won't be any different than
> other objects like tables. Anyway, I have another idea which is to not
> allow creating slots during binary upgrade unless one specifically
> requests it by having an API like binary_upgrade_allow_slot_create()
> similar to existing APIs binary_upgrade_*.

I confirmed the part and confirmed that objects created after the dump
were not copied to new node. PSA scripts to emulate my test.

# tested steps

-1. applied v18 patch set
0. modified source to create objects during upgrade and install:

```
@@ -188,6 +188,9 @@ check_and_dump_old_cluster(bool live_check)
        if (!user_opts.check)
                generate_old_dump();
 
+       printf("XXX: start to sleep\n");
+       sleep(35);
+
```

1. prepared a node which had a replication slot
2. did pg_upgrade, the process will sleep 35 seconds during that
3. connected to the in-upgrading node by the command:

```
psql "host=`pwd` user=postgres port=50432 replication=database"
```

4. created a table and replication slot. Note that for binary upgrade, it was very
  hard to create tables manually. For me, table "bar" and slot "test" were created.
5. waited until the upgrade and boot new node.
6. confirmed that created tables and slots were not found on new node.

```
new_publisher=# \d
Did not find any relations.

new_publisher=# SELECT slot_name FROM pg_replication_slots WHERE slot_name = 'test';
 slot_name 
-----------
(0 rows)
```

You can execute test_01.sh first, and then execute test_02.sh while the first terminal is stuck.


Note that such creations are theoretically occurred, but it is very rare.
By followings line in start_postmaster(), the TCP/IP connections are refused and
only the superuser can connect to the server.

```
#if !defined(WIN32)
    /* prevent TCP/IP connections, restrict socket access */
    strcat(socket_string,
           " -c listen_addresses='' -c unix_socket_permissions=0700");

    /* Have a sockdir?    Tell the postmaster. */
    if (cluster->sockdir)
        snprintf(socket_string + strlen(socket_string),
                 sizeof(socket_string) - strlen(socket_string),
                 " -c %s='%s'",
                 (GET_MAJOR_VERSION(cluster->major_version) <= 902) ?
                 "unix_socket_directory" : "unix_socket_directories",
                 cluster->sockdir);
#endif
```

Moreover, the socket directory is set to current dir of caller, and port number
is also different from setting written in postgresql.conf.
I think there are few chances that replication slots are accidentally created
during the replication slot.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear hackers,

Based on recent discussions, I updated the patch set. I did not reply one by one
because there are many posts, but thank you for giving many suggestion!

Followings shows what I changed.

1.
This feature is now enabled by default. Instead "--exclude-logical-replication-slots"
was added. (Per suggestions like [1])

2.
Pg_upgrade raises ERROR when some slots are 'WALAVAIL_REMOVED'. (Per discussion[2])

3.
Slots which are 'WALAVAIL_UNRESERVED' are dumped and restored. (Per consideration[3])

4.
Combination --logical-replication-slots-only and other --only options was
prohibit again. (Per suggestion[4]) Currently --data-only and --schema-only
could not be used together, so I followed the same style. Additionally, it's not
easy for user to predict the behavior if specifying many --only command.

5. 
Fixed some bugs related with combinations of options. E.g., v18 did not allow to
use "--create", but now it could use same time. This was because information
of role did not get from node while doing slot dump.

6.
The ordering of patches was changed. The patch "Always persist to disk..."
became 0001. (Per suggestion [4])

7.
Functions for checking were changed (per [5]). Currently WALs between
confirmed_lsn and current location is scanned and confirmed. The requirements
are little hacky:

* The first record after the confirmed_lsn must be SHUTDOWN_CHECKPOINT
* Other records till current position must be either RUNNING_XACT,
  CHECKPOINT_ONLINE or XLOG_FPI_FOR_HINT.

In the checking function (validate_wal_record_types_after), WALs are read
repeatedly and confirmed its type. v18 required to change the version number
for pg_walinspect, it is not needed anymore.


[1]: https://www.postgresql.org/message-id/ad83b9f2-ced3-c51c-342a-cc281ff562fc%40postgresql.org
[2]: https://www.postgresql.org/message-id/CAA4eK1%2B8btsYhNQvw6QJ4iTw1wFhkFXXABT%3DED1eHFvtekRanQ%40mail.gmail.com
[3]:
https://www.postgresql.org/message-id/TYAPR01MB5866FD3F7992A46D0457F0E6F50BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[4]: https://www.postgresql.org/message-id/CAA4eK1%2BCD82Kssy%2BiqpETPKYUh9AmNORF%2B3iGfNXgxKxqL3T6g%40mail.gmail.com
[5]: https://www.postgresql.org/message-id/CAD21AoC4D4wYTcLM8T-rAv%3DpO5kS6ffcVD1e7h4eFERT4%2BfwQQ%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Sat, Aug 12, 2023, 15:20 Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Aug 11, 2023 at 11:38 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Fri, Aug 11, 2023 at 10:46:31AM +0530, Amit Kapila wrote:
> > On Thu, Aug 10, 2023 at 7:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > What I imagined is that we do this check before
> > > check_and_dump_old_cluster() while the server is 'off'. Reading the
> > > slot state file would be simple and I guess we would not need a tool
> > > or cli program for that. We need to expose RepliactionSlotOnDisk,
> > > though.
> >
> > Won't that require a lot of version-specific checks as across versions
> > the file format could be different? For the case of the control file,
> > we use version-specific pg_controldata (for the old cluster, the
> > corresponding version's pg_controldata) utility to read the old
> > version control file. I thought we need something similar here if we
> > want to do what you are suggesting.
>
> You mean the slot file format?

Yes.

>
> We will need that complexity somewhere,
> so why not in pg_upgrade?
>

I don't think we need the complexity of version-specific checks if we
do what we do in get_control_data(). Basically, invoke
version-specific pg_replslotdata to get version-specific slot
information. There has been a proposal for a tool like that [1]. Do
you have something better in mind? If so, can you please explain the
same a bit more?

Yeah, we need something like pg_replslotdata. If there are other useful usecases for this tool, it would be good to have it. But I'm not sure other than pg_upgrade usecase.

Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the old cluster during the upgrade, at least for logical replication slots.

Regards,

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

14 августа 2023 г., 08:07:05

On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Aug 12, 2023, 15:20 Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I don't think we need the complexity of version-specific checks if we
>> do what we do in get_control_data(). Basically, invoke
>> version-specific pg_replslotdata to get version-specific slot
>> information. There has been a proposal for a tool like that [1]. Do
>> you have something better in mind? If so, can you please explain the
>> same a bit more?
>
>
> Yeah, we need something like pg_replslotdata. If there are other useful usecases for this tool, it would be good to
haveit. But I'm not sure other than pg_upgrade usecase. 
>
> Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in
thecontrol file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has
consumedall WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the
oldcluster during the upgrade, at least for logical replication slots. 
>

Right, this is somewhat closer to what Patch is already doing. But
remember in this case we need to remember and use the latest
checkpoint from the control file before the old cluster is started
because otherwise the latest checkpoint location could be even updated
during the upgrade. So, instead of reading from WAL, we need to change
so that we rely on the control file's latest LSN. I would prefer this
idea than to invent a new API/tool like pg_replslotdata.

The other point you and Bruce seem to be favoring is that instead of
dumping/restoring slots via pg_dump, we remember the required
information of slots retrieved during their validation in pg_upgrade
itself and use that to create the slots in the new cluster. Though I
am not aware of doing similar treatment for other objects we restore
in this case it seems reasonable especially because slots are not
stored in the catalog and we anyway already need to retrieve the
required information to validate them, so trying to again retrieve it
via pg_dump doesn't seem useful unless I am missing something. Does
this match your understanding?

Yet another thing I am trying to consider is whether we can allow to
upgrade slots from 16 or 15 to later versions. As of now, the patch
has the following check:
getLogicalReplicationSlots()
{
...
+ /* Check whether we should dump or not */
+ if (fout->remoteVersion < 170000)
+ return;
...
}

If we decide to use the existing view pg_replication_slots then can we
consider upgrading slots from the prior version to 17? Now, if we want
to invent any new API similar to pg_replslotdata then we can't do this
because it won't exist in prior versions but OTOH using existing view
pg_replication_slots can allow us to fetch slot info from older
versions as well. So, I think it is worth considering.

Thoughts?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

14 августа 2023 г., 08:21:45

On Thu, Aug 10, 2023 at 8:32 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Based on recent discussions, I updated the patch set. I did not reply one by one
> because there are many posts, but thank you for giving many suggestion!
>
> Followings shows what I changed.
>
> 1.
> This feature is now enabled by default. Instead "--exclude-logical-replication-slots"
> was added. (Per suggestions like [1])
>

AFAICS, we don't have any concrete agreement on such an option but my
vote is to not have such an option as we don't have any similar option
for any other object. I understand that it could be convenient for
some use cases where some of the logical slots are not yet caught up
w.r.t WAL and users want to upgrade without the slots but not sure if
that is really the case. Does anyone else have an opinion on this
point?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

15 августа 2023 г., 05:21:15

On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Aug 12, 2023, 15:20 Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> I don't think we need the complexity of version-specific checks if we
> >> do what we do in get_control_data(). Basically, invoke
> >> version-specific pg_replslotdata to get version-specific slot
> >> information. There has been a proposal for a tool like that [1]. Do
> >> you have something better in mind? If so, can you please explain the
> >> same a bit more?
> >
> >
> > Yeah, we need something like pg_replslotdata. If there are other useful usecases for this tool, it would be good to
haveit. But I'm not sure other than pg_upgrade usecase. 
> >
> > Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN in
thecontrol file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot has
consumedall WAL records before the last shutdown. We don't need to worry about WAL records generated after starting the
oldcluster during the upgrade, at least for logical replication slots. 
> >
>
> Right, this is somewhat closer to what Patch is already doing. But
> remember in this case we need to remember and use the latest
> checkpoint from the control file before the old cluster is started
> because otherwise the latest checkpoint location could be even updated
> during the upgrade. So, instead of reading from WAL, we need to change
> so that we rely on the control file's latest LSN.

Yes, I was thinking the same idea.

But it works for only replication slots for logical replication. Do we
want to check if no meaningful WAL records are generated after the
latest shutdown checkpoint, for manually created slots (or non-logical
replication slots)? If so, we would need to have something reading WAL
records in the end.

> I would prefer this
> idea than to invent a new API/tool like pg_replslotdata.

+1

>
> The other point you and Bruce seem to be favoring is that instead of
> dumping/restoring slots via pg_dump, we remember the required
> information of slots retrieved during their validation in pg_upgrade
> itself and use that to create the slots in the new cluster. Though I
> am not aware of doing similar treatment for other objects we restore
> in this case it seems reasonable especially because slots are not
> stored in the catalog and we anyway already need to retrieve the
> required information to validate them, so trying to again retrieve it
> via pg_dump doesn't seem useful unless I am missing something. Does
> this match your understanding?

If there are use cases for --logical-replication-slots-only option
other than pg_upgrade, it would be good to have it in pg_dump. I was
just not sure of other use cases.

>
> Yet another thing I am trying to consider is whether we can allow to
> upgrade slots from 16 or 15 to later versions. As of now, the patch
> has the following check:
> getLogicalReplicationSlots()
> {
> ...
> + /* Check whether we should dump or not */
> + if (fout->remoteVersion < 170000)
> + return;
> ...
> }
>
> If we decide to use the existing view pg_replication_slots then can we
> consider upgrading slots from the prior version to 17? Now, if we want
> to invent any new API similar to pg_replslotdata then we can't do this
> because it won't exist in prior versions but OTOH using existing view
> pg_replication_slots can allow us to fetch slot info from older
> versions as well. So, I think it is worth considering.

I think that without 0001 patch the replication slots will not be able
to pass the confirmed_flush_lsn check.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

15 августа 2023 г., 06:06:11

On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN
inthe control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot
hasconsumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting
theold cluster during the upgrade, at least for logical replication slots. 
> > >
> >
> > Right, this is somewhat closer to what Patch is already doing. But
> > remember in this case we need to remember and use the latest
> > checkpoint from the control file before the old cluster is started
> > because otherwise the latest checkpoint location could be even updated
> > during the upgrade. So, instead of reading from WAL, we need to change
> > so that we rely on the control file's latest LSN.
>
> Yes, I was thinking the same idea.
>
> But it works for only replication slots for logical replication. Do we
> want to check if no meaningful WAL records are generated after the
> latest shutdown checkpoint, for manually created slots (or non-logical
> replication slots)? If so, we would need to have something reading WAL
> records in the end.
>

This feature only targets logical replication slots. I don't see a
reason to be different for manually created logical replication slots.
Is there something particular that you think we could be missing?

> > I would prefer this
> > idea than to invent a new API/tool like pg_replslotdata.
>
> +1
>
> >
> > The other point you and Bruce seem to be favoring is that instead of
> > dumping/restoring slots via pg_dump, we remember the required
> > information of slots retrieved during their validation in pg_upgrade
> > itself and use that to create the slots in the new cluster. Though I
> > am not aware of doing similar treatment for other objects we restore
> > in this case it seems reasonable especially because slots are not
> > stored in the catalog and we anyway already need to retrieve the
> > required information to validate them, so trying to again retrieve it
> > via pg_dump doesn't seem useful unless I am missing something. Does
> > this match your understanding?
>
> If there are use cases for --logical-replication-slots-only option
> other than pg_upgrade, it would be good to have it in pg_dump. I was
> just not sure of other use cases.
>

It was primarily for upgrade purposes only. So, as we can't see a good
reason to go via pg_dump let's do it in upgrade unless someone thinks
otherwise.

> >
> > Yet another thing I am trying to consider is whether we can allow to
> > upgrade slots from 16 or 15 to later versions. As of now, the patch
> > has the following check:
> > getLogicalReplicationSlots()
> > {
> > ...
> > + /* Check whether we should dump or not */
> > + if (fout->remoteVersion < 170000)
> > + return;
> > ...
> > }
> >
> > If we decide to use the existing view pg_replication_slots then can we
> > consider upgrading slots from the prior version to 17? Now, if we want
> > to invent any new API similar to pg_replslotdata then we can't do this
> > because it won't exist in prior versions but OTOH using existing view
> > pg_replication_slots can allow us to fetch slot info from older
> > versions as well. So, I think it is worth considering.
>
> I think that without 0001 patch the replication slots will not be able
> to pass the confirmed_flush_lsn check.
>

Right, but we can think of backpatching the same. Anyway, we can do
that as a separate work by starting a new thread to see if there is a
broader agreement for backpatching such a change. For now, we can
focus on >=v17.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

15 августа 2023 г., 07:13:49

On Tuesday, August 15, 2023 11:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
> >
> > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > > > Another idea is (which might have already discussed thoguh) that we
> check if the latest shutdown checkpoint LSN in the control file matches the
> confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that
> the slot has consumed all WAL records before the last shutdown. We don't
> need to worry about WAL records generated after starting the old cluster
> during the upgrade, at least for logical replication slots.
> > > >
> > >
> > > Right, this is somewhat closer to what Patch is already doing. But
> > > remember in this case we need to remember and use the latest
> > > checkpoint from the control file before the old cluster is started
> > > because otherwise the latest checkpoint location could be even
> > > updated during the upgrade. So, instead of reading from WAL, we need
> > > to change so that we rely on the control file's latest LSN.
> >
> > Yes, I was thinking the same idea.
> >
> > But it works for only replication slots for logical replication. Do we
> > want to check if no meaningful WAL records are generated after the
> > latest shutdown checkpoint, for manually created slots (or non-logical
> > replication slots)? If so, we would need to have something reading WAL
> > records in the end.
> >
> 
> > > I would prefer this
> > > idea than to invent a new API/tool like pg_replslotdata.
> >
> > +1

Changed the check to compare the latest checkpoint lsn from pg_controldata
with the confirmed_flush_lsn in pg_replication_slots view.

> >
> > >
> > > The other point you and Bruce seem to be favoring is that instead of
> > > dumping/restoring slots via pg_dump, we remember the required
> > > information of slots retrieved during their validation in pg_upgrade
> > > itself and use that to create the slots in the new cluster. Though I
> > > am not aware of doing similar treatment for other objects we restore
> > > in this case it seems reasonable especially because slots are not
> > > stored in the catalog and we anyway already need to retrieve the
> > > required information to validate them, so trying to again retrieve
> > > it via pg_dump doesn't seem useful unless I am missing something.
> > > Does this match your understanding?
> >
> > If there are use cases for --logical-replication-slots-only option
> > other than pg_upgrade, it would be good to have it in pg_dump. I was
> > just not sure of other use cases.
> >
> 
> It was primarily for upgrade purposes only. So, as we can't see a good reason to
> go via pg_dump let's do it in upgrade unless someone thinks otherwise.

Removed the new option in pg_dump and modified the pg_upgrade
directly use the slot info to restore the slot in new cluster.

> 
> > >
> > > Yet another thing I am trying to consider is whether we can allow to
> > > upgrade slots from 16 or 15 to later versions. As of now, the patch
> > > has the following check:
> > > getLogicalReplicationSlots()
> > > {
> > > ...
> > > + /* Check whether we should dump or not */ if (fout->remoteVersion
> > > + < 170000) return;
> > > ...
> > > }
> > >
> > > If we decide to use the existing view pg_replication_slots then can
> > > we consider upgrading slots from the prior version to 17? Now, if we
> > > want to invent any new API similar to pg_replslotdata then we can't
> > > do this because it won't exist in prior versions but OTOH using
> > > existing view pg_replication_slots can allow us to fetch slot info
> > > from older versions as well. So, I think it is worth considering.
> >
> > I think that without 0001 patch the replication slots will not be able
> > to pass the confirmed_flush_lsn check.
> >
> 
> Right, but we can think of backpatching the same. Anyway, we can do that as a
> separate work by starting a new thread to see if there is a broader agreement
> for backpatching such a change. For now, we can focus on >=v17.
> 

Here is the new version patch which addressed above points.
The new version patch also removes the --exclude-logical-replication-slots
option due to recent comment. 
Thanks Kuroda-san for addressing most of the points. 

Best Regards,
Hou zj

Вложения

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

16 августа 2023 г., 06:07:24

Dear Hou,

Thanks for posting the patch! I want to open a question to gather opinions from others.

> > It was primarily for upgrade purposes only. So, as we can't see a good reason to
> > go via pg_dump let's do it in upgrade unless someone thinks otherwise.
> 
> Removed the new option in pg_dump and modified the pg_upgrade
> directly use the slot info to restore the slot in new cluster.

In this version, creations of logical slots are serialized, whereas old ones were
parallelised per db. Do you it should be parallelized again? I have tested locally
and felt harmless. Also, this approch allows to log the executed SQLs.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

16 августа 2023 г., 13:25:03

Dear hackers,

> > > It was primarily for upgrade purposes only. So, as we can't see a good reason
> to
> > > go via pg_dump let's do it in upgrade unless someone thinks otherwise.
> >
> > Removed the new option in pg_dump and modified the pg_upgrade
> > directly use the slot info to restore the slot in new cluster.
> 
> In this version, creations of logical slots are serialized, whereas old ones were
> parallelised per db. Do you it should be parallelized again? I have tested locally
> and felt harmless. Also, this approch allows to log the executed SQLs.

I updated the patch to allow parallel executions. Workers are launched per slots,
each one connects to the new node via psql and executes pg_create_logical_replication_slot().
Moreover, following points were changed for 0002.

* Ensured to log executed SQLs for creating slots.
* Fixed an issue that 'unreserved' slots could not be upgrade. This change was 
  not expected one. Related discussion was [1].
* Added checks for output plugin libraries. pg_upgrade ensures that plugins
  referred by old slots were installed to the new executable directory. 

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866FD3F7992A46D0457F0E6F50BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Amit,

> > I updated the patch to allow parallel executions. Workers are launched per slots,
> > each one connects to the new node via psql and executes
> pg_create_logical_replication_slot().
> >
> 
> Will it be beneficial for slots? Invoking a separate process each time
> could be more costlier than slot creation. The other thing is during
> slot creation, the snapbuild waits for parallel transactions to finish
> so that can also hurt the patch. I think we can test it by having 50,
> 100, or 500 slots on the old cluster and see if doing parallel
> execution for the creation of those on the new cluster has any benefit
> over serial execution.

Indeed. I have tested based on the comment and found that serial execution was
faster. PSA graphs and tables. The x-axis shows the number of upgraded slots,
y-axis shows the execution time. The parallelism of pg_upgrade (-j) was also
varied during the test.

I've planned to revert the change in upcoming versions.

# compared source code

For parallel execution case, the v21 patch set was used.
For serial execution case, logics in create_logical_replication_slots() are changed,
which is basically same as v20 (I can share if needed).

Moreover, in both cases, debug logs for measuring time were added.

# method

PSA the script. Some given number of slots are created and then pg_upgrade was executed.

# consideration

* In any conditions, the serial execution was faster than parallel. Maybe the
  launching process was more costly than I expected.
* Another reason I thougth was that in case of serial execution, the connection
  to new node was established only once. Parallel case, however, workers must
  establish connections every time. IIUC this requires long duration.
* (very trivial) Number of workers were not affected in serial execution. This
  means the coding seems right.

> > * Added checks for output plugin libraries. pg_upgrade ensures that plugins
> >   referred by old slots were installed to the new executable directory.
> >
> 
> I think this is a good idea but did you test it with out-of-core
> plugins, if so, can you please share the results? Also, let's update
> this information in docs as well.

I have not used other plugins, but forcibly renamed the shared object file.
I would test by plugins like wal2json[1] if more cases are needed.

1. created logical replication slots on old node
  SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding')
2. stopped the old nde
3. forcibly renamed the so file. I used following script:
  sudo mv /path/to/test_decoding.so /path/to//test\"_decoding.so
4. executed pg_upgrade and failed. Outputs what I got were:

```
Checking for presence of required libraries                 fatal

Your installation references loadable libraries that are missing from the
new installation.  You can add these libraries to the new installation,
or remove the functions using them from the old installation.  A list of
problem libraries is in the file:
    data_N3/pg_upgrade_output.d/20230817T100926.979/loadable_libraries.txt
Failure, exiting
```

And contents of loadable_libraries.txt were below:

```
could not load library "test_decoding": ERROR:  could not access file "test_decoding": No such file or directory
In database: postgres
```

[1]: https://github.com/eulerto/wal2json

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

PSA new version patch set.

> Here are some review comments for the patch v21-0003
> 
> ======
> Commit message
> 
> 1.
> pg_upgrade fails if the old node has slots which status is 'lost' or they do not
> consume all WAL records. These are needed for prevent the data loss.
> 
> ~
> 
> Maybe some minor brush-up like:
> 
> SUGGESTION
> In order to prevent data loss, pg_upgrade will fail if the old node
> has slots with the status 'lost', or with unconsumed WAL records.

Improved.

> src/bin/pg_upgrade/check.c
> 
> 2. check_for_confirmed_flush_lsn
> 
> + /* Check that all logical slots are not in 'lost' state. */
> + res = executeQueryOrDie(conn,
> + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> + "WHERE temporary = false AND wal_status = 'lost';");
> +
> + ntups = PQntuples(res);
> + i_slotname = PQfnumber(res, "slot_name");
> +
> + for (i = 0; i < ntups; i++)
> + {
> + is_error = true;
> +
> + pg_log(PG_WARNING,
> +    "\nWARNING: logical replication slot \"%s\" is obsolete.",
> +    PQgetvalue(res, i, i_slotname));
> + }
> +
> + PQclear(res);
> +
> + if (is_error)
> + pg_fatal("logical replication slots not to be in 'lost' state.");
> +
> 
> 2a. (GENERAL)
> The above code for checking lost state seems out of place in this
> function which is meant for checking confirmed flush lsn.
> 
> Maybe you jammed both kinds of logic into one function to save on the
> extra PGconn or something but IMO two separate functions would be
> better. e.g.
> - check_for_lost_slots
> - check_for_confirmed_flush_lsn

Separated into check_for_lost_slots and check_for_confirmed_flush_lsn.

> 2b.
> + /* Check that all logical slots are not in 'lost' state. */
> 
> SUGGESTION
> /* Check there are no logical replication slots with a 'lost' state. */

Changed.

> 2c.
> + res = executeQueryOrDie(conn,
> + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> + "WHERE temporary = false AND wal_status = 'lost';");
> 
> This SQL fragment is very much like others in previous patches. Be
> sure to make all the cases and clauses consistent with all those
> similar SQL fragments.

Unified the order. Note that they could not be the completely the same.

> 2d.
> + is_error = true;
> 
> That doesn't need to be in the loop. Better to just say:
> is_error = (ntups > 0);

Removed the variable.

> 2e.
> There is a mix of terms in the WARNING and in the pg_fatal -- e.g.
> "obsolete" versus "lost". Is it OK?

Unified to 'lost'.

> 2f.
> + pg_fatal("logical replication slots not to be in 'lost' state.");
> 
> English? And maybe it should be much more verbose...
> 
> "Upgrade of this installation is not allowed because one or more
> logical replication slots with a state of 'lost' were detected."

I checked other pg_fatal() and the statement like "Upgrade of this installation is not allowed"
could not be found. So I used later part.

> 3. check_for_confirmed_flush_lsn
> 
> + /*
> + * Check that all logical replication slots have reached the latest
> + * checkpoint position (SHUTDOWN_CHECKPOINT record). This checks cannot
> be
> + * done in case of live_check because the server has not been written the
> + * SHUTDOWN_CHECKPOINT record yet.
> + */
> + if (!live_check)
> + {
> + res = executeQueryOrDie(conn,
> + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> + "WHERE confirmed_flush_lsn != '%X/%X' AND temporary = false;",
> + old_cluster.controldata.chkpnt_latest_upper,
> + old_cluster.controldata.chkpnt_latest_lower);
> +
> + ntups = PQntuples(res);
> + i_slotname = PQfnumber(res, "slot_name");
> +
> + for (i = 0; i < ntups; i++)
> + {
> + is_error = true;
> +
> + pg_log(PG_WARNING,
> +    "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
> +    PQgetvalue(res, i, i_slotname));
> + }
> +
> + PQclear(res);
> + PQfinish(conn);
> +
> + if (is_error)
> + pg_fatal("All logical replication slots consumed all the WALs.");
> 
> ~
> 
> 3a.
> /This checks/This check/

The comment was no longer needed, because the caller checks live_check variable.
More detail, please see my another post [1].

> 3b.
> I don't think the separation of
> chkpnt_latest_upper/chkpnt_latest_lower is needed like this. AFAIK
> there is an LSN_FORMAT_ARGS(lsn) macro designed for handling exactly
> this kind of parameter substitution.

Fixed to use the macro.

Previously I considered that the header "access/xlogdefs.h" could not be included
from pg_upgrade, and it was the reason why I did not use. But it seemed my
misunderstanding - I could include the file.

> 3c.
> + is_error = true;
> 
> That doesn't need to be in the loop. Better to just say:
> is_error = (ntups > 0);

Removed.

> 3d.
> + pg_fatal("All logical replication slots consumed all the WALs.");
> 
> The message seems backward. shouldn't it say something like:
> "Upgrade of this installation is not allowed because one or more
> logical replication slots still have unconsumed WAL records."

I used only later part, see above reply.

> src/bin/pg_upgrade/controldata.c
> 
> 4. get_control_data
> 
> + /*
> + * Upper and lower part of LSN must be read and stored
> + * separately because it is reported as %X/%X format.
> + */
> + cluster->controldata.chkpnt_latest_upper =
> + strtoul(p, &slash, 16);
> + cluster->controldata.chkpnt_latest_lower =
> + strtoul(++slash, NULL, 16);
> 
> I felt that this field separation code is maybe not necessary. Please
> refer to other review comments in this post.

Hmm. I thought they must be read separately even if we stored as XLogRecPtr (uint64).
This is because the pg_controldata reports the LSN as %X/%X style. Am I missing something?

```
$ pg_controldata -D data_N1/ | grep "Latest checkpoint location"
Latest checkpoint location:           0/153C8D0
```

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 5. ControlData
> 
> +
> + uint32 chkpnt_latest_upper;
> + uint32 chkpnt_latest_lower;
>  } ControlData;
> 
> ~
> 
> Actually, I did not recognise the reason why this cannot be stored
> properly as a single XLogRecPtr field. Please see other review
> comments in this post.

Changed to use XLogRecPtr. See above comment.

> .../t/003_logical_replication_slots.pl
> 
> 6. GENERAL
> 
> Many of the changes to this file are just renaming the
> 'old_node'/'new_node' to 'old_publisher'/'new_publisher'.
> 
> This seems a basic change not really associated with this patch 0003.
> To reduce the code churn, this change should be moved into the earlier
> patch where this test file (003_logical_replication_slots.pl) was
> first introduced,

Moved these renaming to 0002.

> 7.
> 
> # Cause a failure at the start of pg_upgrade because slot do not finish
> # consuming all the WALs
> 
> ~
> 
> Can you give a more detailed explanation in the comment of how this
> test case achieves what it says?

Slightly reworded above and this comment. How do you think?

> src/test/regress/sql/misc_functions.sql
> 
> 8.
> @@ -236,4 +236,4 @@ SELECT * FROM pg_split_walfile_name('invalid');
>  SELECT segment_number > 0 AS ok_segment_number, timeline_id
>    FROM pg_split_walfile_name('000000010000000100000000');
>  SELECT segment_number > 0 AS ok_segment_number, timeline_id
> -  FROM pg_split_walfile_name('ffffffFF00000001000000af');
> +  FROM pg_split_walfile_name('ffffffFF00000001000000af');
> \ No newline at end of file
> 
> ~
> 
> What is this change for? It looks like maybe some accidental
> whitespace change happened.

It was unexpected, removed.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866691219B9CB280B709600F51BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

19 августа 2023 г., 13:09:20

On Fri, Aug 18, 2023 at 7:21 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>

Few comments on new patches:
1.
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, execute the
+     <command>ALTER SUBSCRIPTION ... CONNECTION</command> command to update the
+     connection string, and then re-enable the subscription.

Why does one need to update the connection string?

2.
+ /*
+ * Checking for logical slots must be done before
+ * check_new_cluster_is_empty() because the slot_arr attribute of the
+ * new_cluster will be checked in that function.
+ */
+ if (count_logical_slots(&old_cluster))
+ {
+ get_logical_slot_infos(&new_cluster, false);
+ check_for_logical_replication_slots(&new_cluster);
+ }
+
  check_new_cluster_is_empty();

Can't we simplify this checking by simply querying
pg_replication_slots for any usable slot something similar to what we
are doing in check_for_prepared_transactions()? We can add this check
in the function check_for_logical_replication_slots(). Also, do we
need a count function, or instead can we have a simple function like
is_logical_slot_present() where we return even if there is one slot
present?

Apart from this, (a) I have made a few changes (changed comments) in
patch 0001 as shared in the email [1]; (b) some modifications in the
docs as you can see in the attached. Please include those changes in
the next version if you think they are okay.

[1] - https://www.postgresql.org/message-id/CAA4eK1JzJagMmb_E8D4au%3DGYQkxox0AfNBm1FbP7sy7t4YWXPQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

Вложения

mod_amit_1.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

20 августа 2023 г., 16:18:42

On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 6:07 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > > > Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint
LSNin the control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the
slothas consumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after
startingthe old cluster during the upgrade, at least for logical replication slots. 
> > > > > >
> > > > >
> > > > > Right, this is somewhat closer to what Patch is already doing. But
> > > > > remember in this case we need to remember and use the latest
> > > > > checkpoint from the control file before the old cluster is started
> > > > > because otherwise the latest checkpoint location could be even updated
> > > > > during the upgrade. So, instead of reading from WAL, we need to change
> > > > > so that we rely on the control file's latest LSN.
> > > >
> > > > Yes, I was thinking the same idea.
> > > >
> > > > But it works for only replication slots for logical replication. Do we
> > > > want to check if no meaningful WAL records are generated after the
> > > > latest shutdown checkpoint, for manually created slots (or non-logical
> > > > replication slots)? If so, we would need to have something reading WAL
> > > > records in the end.
> > > >
> > >
> > > This feature only targets logical replication slots. I don't see a
> > > reason to be different for manually created logical replication slots.
> > > Is there something particular that you think we could be missing?
> >
> > Sorry I was not clear. I meant the logical replication slots that are
> > *not* used by logical replication, i.e., are created manually and used
> > by third party tools that periodically consume decoded changes. As we
> > discussed before, these slots will never be able to pass that
> > confirmed_flush_lsn check.
> >
>
> I think normally one would have a background process to periodically
> consume changes. Won't one can use the walsender infrastructure for
> their plugins to consume changes probably by using replication
> protocol?

Not sure.

> Also, I feel it is the plugin author's responsibility to
> consume changes or advance slot to the required position before
> shutdown.

How does the plugin author ensure that the slot consumes all WAL
records including shutdown_checkpoint before shutdown?

>
> > After some thoughts, one thing we might
> > need to consider is that in practice, the upgrade project is performed
> > during the maintenance window and has a backup plan that revert the
> > upgrade process, in case something bad happens. If we require the
> > users to drop such logical replication slots, they cannot resume to
> > use the old cluster in that case, since they would need to create new
> > slots, missing some changes.
> >
>
> Can't one keep the backup before removing slots?

Yes, but restoring the back could take time.

>
> > Other checks in pg_upgrade seem to be
> > compatibility checks that would eventually be required for the upgrade
> > anyway. Do we need to consider this case? For example, we do that
> > confirmed_flush_lsn check for only the slots with pgoutput plugin.
> >
>
> I think one is allowed to use pgoutput plugin even for manually
> created slots. So, such a check may not work.

Right, but I thought it's a very rare case.

Since the slot's flushed_confirmed_lsn check is not a compatibility
check unlike the existing check, I wonder if we can make it optional.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

20 августа 2023 г., 18:20:45

On Fri, Aug 18, 2023 at 10:51 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Peter,
>
> PSA new version patch set.
>

I've looked at the v22 patch set, and here are some comments:

0001:

Do we need regression tests to make sure that the slot's
confirmed_flush_lsn matches the LSN of the latest shutdown_checkpoint
record?

0002:

+   <step>
+    <title>Prepare for publisher upgrades</title>
+

Should this step be done before "8. Stop both servers" as it might
require to disable subscriptions and to drop 'lost' replication slots?

Why is there no explanation about the slots' confirmed_flush_lsn check
as prerequisites?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

21 августа 2023 г., 06:20:32

On Sun, Aug 20, 2023 at 6:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > Sorry I was not clear. I meant the logical replication slots that are
> > > *not* used by logical replication, i.e., are created manually and used
> > > by third party tools that periodically consume decoded changes. As we
> > > discussed before, these slots will never be able to pass that
> > > confirmed_flush_lsn check.
> > >
> >
> > I think normally one would have a background process to periodically
> > consume changes. Won't one can use the walsender infrastructure for
> > their plugins to consume changes probably by using replication
> > protocol?
>
> Not sure.
>

I think one can use Streaming Replication Protocol to achieve it [1].

> > Also, I feel it is the plugin author's responsibility to
> > consume changes or advance slot to the required position before
> > shutdown.
>
> How does the plugin author ensure that the slot consumes all WAL
> records including shutdown_checkpoint before shutdown?
>

By using "Streaming Replication Protocol" so that walsender can take
care of it. If not, I think users should drop such slots before the
upgrade because anyway, they won't be usable after the upgrade.

> >
> > > After some thoughts, one thing we might
> > > need to consider is that in practice, the upgrade project is performed
> > > during the maintenance window and has a backup plan that revert the
> > > upgrade process, in case something bad happens. If we require the
> > > users to drop such logical replication slots, they cannot resume to
> > > use the old cluster in that case, since they would need to create new
> > > slots, missing some changes.
> > >
> >
> > Can't one keep the backup before removing slots?
>
> Yes, but restoring the back could take time.
>
> >
> > > Other checks in pg_upgrade seem to be
> > > compatibility checks that would eventually be required for the upgrade
> > > anyway. Do we need to consider this case? For example, we do that
> > > confirmed_flush_lsn check for only the slots with pgoutput plugin.
> > >
> >
> > I think one is allowed to use pgoutput plugin even for manually
> > created slots. So, such a check may not work.
>
> Right, but I thought it's a very rare case.
>

Okay, but not sure that we can ignore it.

> Since the slot's flushed_confirmed_lsn check is not a compatibility
> check unlike the existing check, I wonder if we can make it optional.
>

There are arguments both ways. Initially, the patch proposed to make
them optional by having an option like
--include-logical-replication-slots but Jonathan raised a point that
it will be more work for users and should be the default. Then we also
discussed having an option like --exclude-logical-replication-slots
but as we don't have any other similar option, it doesn't seem natural
to add such an option. Also, I am afraid, if there is no user of such
an option, it won't be worth it. BTW, how would you like to see it as
an optional (via --include or via --exclude switch)?

Personally, I am okay to make it optional if we have a broader
consensus. My preference would be to have an --exclude kind of option.
How about first getting the main patch reviewed and committed, then
based on consensus, we can decide whether to make it optional and if
so, what is the preferred way?

[1] - https://www.postgresql.org/docs/current/protocol-replication.html

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

21 августа 2023 г., 10:16:46

Here are some review comments for v22-0002

======
Commit Message

1.
This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
newly extracted. At the later part of upgrading, pg_upgrade revisits the list
and restores slots by using the pg_create_logical_replication_slots() on the new
clushter.

~

1a
/is newly extracted/is fetched/

~

1b.
/using the pg_create_logical_replication_slots()/executing
pg_create_logical_replication_slots()/

~

1c.
/clushter/cluster/

~~~

2.
Note that it must be done after the final pg_resetwal command during the upgrade
because pg_resetwal will remove WALs that are required by the slots. Due to the
restriction, the timing of restoring replication slots is different from other
objects.

~

2a.
/it must/slot restoration/

~

2b.
/the restriction/this restriction/

======
doc/src/sgml/ref/pgupgrade.sgml

3.
+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slot on the new publisher.
+    </para>

/same replication slot/same replication slots/

~~~

4.
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, execute the
+     <command>ALTER SUBSCRIPTION ... CONNECTION</command> command to update the
+     connection string, and then re-enable the subscription.
+    </para>

On the rendered page, it looks a bit strange that DISABLE has a link
but COMMENTION does not have a link.

~~~

5.
+    <para>
+     There are some prerequisites for <application>pg_upgrade</application> to
+     be able to upgrade the replication slots. If these are not met an error
+     will be reported.
+    </para>
+
+    <itemizedlist>

+1 to use all the itemizedlist changes that Amit suggested [1] in his
attachment.

======
src/bin/pg_upgrade/check.c

6.
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);

IMO the arg name should not shadow a global with the same name. See
other review comment for this function signature.

~~~

7.
+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos(&old_cluster, live_check);

But 'live_check' is never used?

~~~

8. check_for_logical_replication_slots
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)

IMO the arg name should not shadow a global with the same name. If
this is never going to be called with any param other than
&new_cluster then probably it is better not even to pass have that
argument at all. Just refer to the global new_cluster inside the
function.

You can't say that 'check_for_new_tablespace_dir' does it already so
it must be OK -- I think that the existing function has the same issue
and it also ought to be fixed to avoid shadowing!

~~~

9. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;

IMO the code matches the comment better if you say < 1700 instead of <= 1600.

======
src/bin/pg_upgrade/function.c

10. get_loadable_libraries
  /*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions or referred
+ * by logical replication slots in this DB.
  */
  ress[dbnum] = executeQueryOrDie(conn,
~

/referred by/referred to by/

======
src/bin/pg_upgrade/info.c

11.
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo    *pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+ get_logical_slot_infos_per_db(cluster, pDbInfo);
+ slot_count += pDbInfo->slot_arr.nslots;
+
+ if (log_opts.verbose)
+ {
+ pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+ print_slot_infos(&pDbInfo->slot_arr);
+ }
+ }
+}
+

11a.
Now the variable 'slot_count' is no longer being returned so it seems redundant.

~

11b.
What is the 'live_check' parameter for? Nobody is using it.

~~~

12. count_logical_slots

+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slotnum = 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slotnum;
+}

IMO this variable should be called something like 'slot_count'. This
is the same review comment also made in a previous review. (See [2]
comment#12).

~~~

13. print_slot_infos

+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_arr->slots[slotnum].slotname,
+    slot_arr->slots[slotnum].plugin,
+    slot_arr->slots[slotnum].two_phase);
+}

It might be nicer to introduce a variable, instead of all those array
dereferences:

LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];

~~~

14.
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ /*
+ * Constructs query for creating logical replication slots.
+ *
+ * XXX: For simplification, pg_create_logical_replication_slot() is
+ * used. Is it sufficient?
+ */
+ appendPQExpBuffer(query, "SELECT
pg_catalog.pg_create_logical_replication_slot(");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].slotname,
+ conn);
+ appendPQExpBuffer(query, ", ");
+ appendStringLiteralConn(query, slot_arr->slots[slotnum].plugin,
+ conn);
+ appendPQExpBuffer(query, ", false, %s);",
+   slot_arr->slots[slotnum].two_phase ? "true" : "false");
+
+ PQclear(executeQueryOrDie(conn, "%s", query->data));
+
+ resetPQExpBuffer(query);
+ }
+
+ PQfinish(conn);
+
+ destroyPQExpBuffer(query);
+ }
+
+ end_progress_output();
+ check_ok();

14a
Similar to the previous comment (#13). It might be nicer to introduce
a variable, instead of all those array dereferences:

LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
~

14b.
It was not clear to me why this command is not being built using
executeQueryOrDie directly instead of using the query buffer. Is there
some reason?

~

14c.
I think it would be cleaner to have a separate res variable like you
used elsewhere:
res = executeQueryOrDie(...)

instead of doing PQclear(executeQueryOrDie(conn, "%s", query->data));
in one line

======
src/bin/pg_upgrade/pg_upgrade.

15.
+void get_logical_slot_infos(ClusterInfo *cluster, bool live_check);

I didn't see a reason for that 'live_check' parameter.

======
.../pg_upgrade/t/003_logical_replication_slots.pl

16.
IMO this would be much easier to read if there were BIG comments
between the actual TEST parts

For example

# ------------------------------
# TEST: Confirm pg_upgrade fails is new node wal_level is not 'logical'
<preparation>
<test>
<cleanup>

# ------------------------------
# TEST: Confirm pg_upgrade fails max_replication_slots on new node is too low
<preparation>
<test>
<cleanup>

# ------------------------------
# TEST: Successful upgrade
<preparation>
<test>
<cleanup>

~~~

17.
+# Cause a failure at the start of pg_upgrade because wal_level is replica
+command_fails(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d',         $old_publisher->data_dir,
+ '-D',         $new_publisher->data_dir,
+ '-b',         $bindir,
+ '-B',         $bindir,
+ '-s',         $new_publisher->host,
+ '-p',         $old_publisher->port,
+ '-P',         $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old node with wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ not removed after pg_upgrade failure");

The message is ambiguous

BEFORE
'run of pg_upgrade of old node with wrong wal_level'

SUGGESTION
'run of pg_upgrade where the new node has the wrong wal_level'

~~~

18.
+# Create an unnecessary slot on old node
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
+]);
+
+$old_publisher->stop;
+
+# Preparations for the subsequent test. max_replication_slots is set to
+# smaller than existing slots on old node
+$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");
+$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");


IMO the comment is misleading. It is not an "unnecessary slot", it is
just a 2nd slot. And this is all part of the preparation for the next
test so it should be under the other comment.

For example SUGGESTION changes like this:

# Preparations for the subsequent test.
# 1. Create an unnecessary slot on the old node
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2',
'test_decoding', false, true);
]);
$old_publisher->stop;
# 2. max_replication_slots is set to smaller than the number of slots
(2) present on the old node
$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
# 3. new node wal_level is set correctly
$new_publisher->append_conf('postgresql.conf', "wal_level = 'logical'");

~~~

19.
+# Remove an unnecessary slot and consume WAL records
+$old_publisher->start;
+$old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_drop_replication_slot('test_slot2');
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL)
+]);
+$old_publisher->stop;
+

This comment should say more like:

# Preparations for the subsequent test.

~~~

20.
+# Actual run, pg_upgrade_output.d is removed at the end

This comment should mention that "successful upgrade is expected"
because all the other prerequisites are now satisfied.

~~~

21.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new node');

Should there be a matching new_pulisher->stop;?

------
[1] https://www.postgresql.org/message-id/CAA4eK1%2BdT2g8gmerguNd_TA%3DXMnm00nLzuEJ_Sddw6Pj-bvKVQ%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/TYAPR01MB586604802ABE42E11866762FF51BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

21 августа 2023 г., 12:45:44

On Monday, August 21, 2023 11:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Sun, Aug 20, 2023 at 6:49 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > >
> > > > Sorry I was not clear. I meant the logical replication slots that
> > > > are
> > > > *not* used by logical replication, i.e., are created manually and
> > > > used by third party tools that periodically consume decoded
> > > > changes. As we discussed before, these slots will never be able to
> > > > pass that confirmed_flush_lsn check.
> > > >
> > >
> > > I think normally one would have a background process to periodically
> > > consume changes. Won't one can use the walsender infrastructure for
> > > their plugins to consume changes probably by using replication
> > > protocol?
> >
> > Not sure.
> >
> 
> I think one can use Streaming Replication Protocol to achieve it [1].
> 
> > > Also, I feel it is the plugin author's responsibility to consume
> > > changes or advance slot to the required position before shutdown.
> >
> > How does the plugin author ensure that the slot consumes all WAL
> > records including shutdown_checkpoint before shutdown?
> >
> 
> By using "Streaming Replication Protocol" so that walsender can take care of it.
> If not, I think users should drop such slots before the upgrade because anyway,
> they won't be usable after the upgrade.

Yes, I think pglogical is one example which start a bgworker(apply worker) on client to
consume changes which also uses Streaming Replication Protocol IIRC. And
pg_recvlogical is another example which connects to walsender and consume changes.

Best Regards,
Hou zj

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

21 августа 2023 г., 13:12:03

On Friday, August 18, 2023 9:52 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> Dear Peter,
> 
> PSA new version patch set.

Thanks for updating the patch!
Here are few comments about 0003 patch.

1.

+check_for_lost_slots(ClusterInfo *cluster)
+{
+    int            i,
+                ntups,
+                i_slotname;
+    PGresult   *res;
+    DbInfo       *active_db = &cluster->dbarr.dbs[0];
+    PGconn       *conn = connectToServer(cluster, active_db->db_name);
+ 
+    /* logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+        return;

I think we should build connection after this check, otherwise the connection
may be left open after returning.


2.
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+    int            i,
+                ntups,
+                i_slotname;
+    PGresult   *res;
+    DbInfo       *active_db = &cluster->dbarr.dbs[0];
+    PGconn       *conn = connectToServer(cluster, active_db->db_name);
+
+    /* logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+        return;

Same as above.

3.
+                if (GET_MAJOR_VERSION(cluster->major_version) >= 17)
+                {

I think you mean 1700 here.


4.
+                    p = strpbrk(p, "01234567890ABCDEF");
+
+                    /*
+                     * Upper and lower part of LSN must be read separately
+                     * because it is reported as %X/%X format.
+                     */
+                    upper_lsn = strtoul(p, &slash, 16);
+                    lower_lsn = strtoul(++slash, NULL, 16);

Maybe we'd better add a sanity check after strpbrk like "if (p == NULL ||
strlen(p) <= 1)" to be consistent with other similar code.

Best Regards,
Hou zj

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

21 августа 2023 г., 16:02:30

Dear Amit,

Thank you for giving comments! PSA new version patch set.

> 1.
> +     <link linkend="sql-altersubscription"><command>ALTER
> SUBSCRIPTION ... DISABLE</command></link>.
> +     After the upgrade is complete, execute the
> +     <command>ALTER SUBSCRIPTION ... CONNECTION</command>
> command to update the
> +     connection string, and then re-enable the subscription.
> 
> Why does one need to update the connection string?

I wrote like that because the old and new port number can be different. But you
are partially right - it is not always needed. Updated to clarify that.

> 2.
> + /*
> + * Checking for logical slots must be done before
> + * check_new_cluster_is_empty() because the slot_arr attribute of the
> + * new_cluster will be checked in that function.
> + */
> + if (count_logical_slots(&old_cluster))
> + {
> + get_logical_slot_infos(&new_cluster, false);
> + check_for_logical_replication_slots(&new_cluster);
> + }
> +
>   check_new_cluster_is_empty();
> 
> Can't we simplify this checking by simply querying
> pg_replication_slots for any usable slot something similar to what we
> are doing in check_for_prepared_transactions()? We can add this check
> in the function check_for_logical_replication_slots().

Some checks were included to check_for_logical_replication_slots(), and
get_logical_slot_infos() for new_cluster was removed as you said.

But get_logical_slot_infos() cannot be removed completely, because the old
cluster has already been shut down when the new cluster is checked. We must
store the information of old cluster on the memory.

Note that the existence of slots are now checked in any cases because such slots
could not be used after the upgrade.

check_new_cluster_is_empty() is no longer checks logical slots, so all changes for
this function was reverted.

> Also, do we
> need a count function, or instead can we have a simple function like
> is_logical_slot_present() where we return even if there is one slot
> 

I think this is still needed, because max_replication_slots and the number
of existing replication slots must be compared.

Of course we can add another simple function like
is_logical_slot_present_on_old_cluster() and use in main(), but not sure defining
some similar functions are good.

> Apart from this, (a) I have made a few changes (changed comments) in
> patch 0001 as shared in the email [1]; (b) some modifications in the
> docs as you can see in the attached. Please include those changes in
> the next version if you think they are okay.

I checked and your modification seems nice. 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Hi Kuroda-san,

Here are some review comments for v22-0003.

(FYI, I was already mid-way through this review before you posted new v23* patches, so I am posting it anyway in case some comments still apply.)

======
src/bin/pg_upgrade/check.c

1. check_for_lost_slots

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

1a
Maybe the comment should start uppercase for consistency with others.

~

1b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the comment.

~~~

2. check_for_lost_slots
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+ PQgetvalue(res, i, i_slotname));
+ }
+
+

The braces {} are not needed anymore

~~~

3. check_for_confirmed_flush_lsn

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

3a.
Maybe the comment should start uppercase for consistency with others.

~

3b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the comment.

~~~

4. check_for_confirmed_flush_lsn
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
+ }
+

The braces {} are not needed anymore

======
src/bin/pg_upgrade/controldata.c

5. get_control_data
+ /*
+ * Gather latest checkpoint location if the cluster is newer or
+ * equal to 17. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 17)

5a.
/newer or equal to 17/PG17 or later/

~~~

5b.
>= 17 should be >= 1700

~~~

6. get_control_data
+ {
+ char *slash = NULL;
+ uint64 upper_lsn, lower_lsn;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ /*
+ * Upper and lower part of LSN must be read separately
+ * because it is reported as %X/%X format.
+ */
+ upper_lsn = strtoul(p, &slash, 16);
+ lower_lsn = strtoul(++slash, NULL, 16);
+
+ /* And combine them */
+ cluster->controldata.chkpnt_latest =
+ (upper_lsn << 32) | lower_lsn;
+ }

Should 'upper_lsn' and 'lower_lsn' be declared as uint32? That seems a better mirror for LSN_FORMAT_ARGS.

======
src/bin/pg_upgrade/info.c

7. get_logical_slot_infos
+
+ /*
+ * Do additional checks if slots are found on the old node. If something is
+ * found on the new node, a subsequent function
+ * check_new_cluster_is_empty() would report the name of slots and raise a
+ * fatal error.
+ */
+ if (cluster == &old_cluster && slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

It somehow doesn't feel right for these extra checks to be jammed into this function, just because you conveniently have the slot_count available.

On the NEW cluster side, there was extra checking in the check_new_cluster() function.

For consistency, I think this OLD cluster checking should be done in the check_and_dump_old_cluster() function -- see the "Check for various failure cases" comment -- IMO this new fragment belongs there with the other checks.

======
src/bin/pg_upgrade/pg_upgrade.h

8.
bool date_is_int;
bool float8_pass_by_value;
uint32 data_checksum_version;
+
+ XLogRecPtr chkpnt_latest;
} ControlData;

I don't think the new field is particularly different from all the others that it needs a blank line separator.

======
.../t/003_logical_replication_slots.pl

9.
# Initialize old node
my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
$old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;

# Initialize new node
my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
$new_publisher->init(allows_streaming => 'replica');

-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');

~

Are those removal of the old_publisher start/stop changes that actually should be done in the 0002 patch?

~~~

10.
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);
]);

~

What is the purpose of the added SELECT? It doesn't seem covered by the comment.

~~~

11.
# Remove an unnecessary slot and generate WALs. These records would not be
# consumed before doing pg_upgrade, so that the upcoming test would fail.
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_drop_replication_slot('test_slot2');
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
]);
$old_publisher->stop;

Minor rewording of comment sentence.

SUGGESTION
Because these WAL records do not get consumed it will cause the upcoming pg_upgrade test to fail.

~~~

12.
# Cause a failure at the start of pg_upgrade because the slot still have
# unconsumed WAL records

~

/still have/still has/

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

22 августа 2023 г., 04:49:09

Here are some review comments for v23-0001

======
1. GENERAL -- git apply

The patch fails to apply cleanly. There are whitespace warnings.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch:102: trailing whitespace.
# SHUTDOWN_CHECKPOINT record.
warning: 1 line adds whitespace errors.

~~~

2. GENERAL -- which patch is the real one and which is the copy?

IMO this patch has become muddled.

Amit recently created a new thread [1] "persist logical slots to disk during shutdown checkpoint", which I thought was dedicated to the discussion/implementation of this 0001 patch. Therefore, I expected any 0001 patch changes to would be made only in that new thread from now on, (and maybe you would mirror them here in this thread).

But now I see there are v23-0001 patch changes here again. So, now the same patch is in 2 places and they are different. It is no longer clear to me which 0001 ("Always persist...") patch is the definitive one, and which one is the copy.

??

======
contrib/test_decoding/t/002_always_persist.pl

3.
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persist to disk during a shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;

/always persist/always persisted/

~~~

4.
+
+# Test set-up
+my $node = PostgreSQL::Test::Cluster->new('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+
+$node->start;
+
+# Create table
+$node->safe_psql('postgres', "CREATE TABLE test (id int)");

Maybe it is better to call the table something different instead of the same name as the cluster. e.g. 'test_tbl' would be better.

~~~

5.
+# Shutdown the node once to do shutdown checkpoint
+$node->stop();
+

SUGGESTION
# Stop the node to cause a shutdown checkpoint

~~~

6.
+# Fetch checkPoint from the control file itself
+my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+my @control_data = split("\n", $stdout);
+my $latest_checkpoint = undef;
+foreach (@control_data)
+{
+ if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+ {
+ $latest_checkpoint = $1;
+ last;
+ }
+}
+die "No checkPoint in control file found\n"
+ unless defined($latest_checkpoint);
+

6a.
/checkPoint/checkpoint/ (2x)

~

6b.
+die "No checkPoint in control file found\n"

SUGGESTION
"No checkpoint found in control file\n"

------
[1] https://www.postgresql.org/message-id/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

22 августа 2023 г., 07:45:07

On Tue, Aug 22, 2023 at 7:19 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v23-0001
>
> ======
> 1. GENERAL -- git apply
>
> The patch fails to apply cleanly. There are whitespace warnings.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
> ../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch:102: trailing whitespace.
> # SHUTDOWN_CHECKPOINT record.
> warning: 1 line adds whitespace errors.
>
> ~~~
>
> 2. GENERAL -- which patch is the real one and which is the copy?
>
> IMO this patch has become muddled.
>
> Amit recently created a new thread [1] "persist logical slots to disk during shutdown checkpoint", which I thought
wasdedicated to the discussion/implementation of this 0001 patch. 
>

Right, I feel it would be good to discuss 0001 on the new thread.
Here, we can just include it for the sake of completeness and testing
purposes.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

22 августа 2023 г., 09:01:12

On Mon, Aug 21, 2023 at 6:35 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > 9. check_for_logical_replication_slots
> >
> > + /* logical replication slots can be migrated since PG17. */
> > + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> > + return;
> >
> > IMO the code matches the comment better if you say < 1700 instead of <= 1600.
>
> Changed.
>

I think it is better to be consistent with the existing code. There
are a few other checks in pg_upgrade.c that uses <=, so it is better
to use it in the same way here.

Another minor comment:
Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER
SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.

I think this is true in general as well and not specific to
pg_upgrade. So, we can avoid adding anything about connection change
here.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

22 августа 2023 г., 09:01:41

Hi Kuroda-san,

Here are some review comments for patch v23-0002

======
1. GENERAL

Please try to run a spell/grammar check on all the text like commit message and docs changes before posting (e.g. cut/paste the rendered text into some tool like MSWord or Grammarly or ChatGPT or whatever tool you like and cross-check). There are lots of small typos etc but one up-front check could avoid long cycles of reviewing/reporting/fixing/re-posting/confirming...

======
Commit message

2.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to ths restriction, the timing of restoring replication slots is
different from other objects.

~

/ths/this/

======
doc/src/sgml/ref/pgupgrade.sgml

3.
+ <para>
+ Before you start upgrading the publisher cluster, ensure that the
+ subscription is temporarily disabled, by executing
+ <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+ After the upgrade is complete, then re-enable the subscription. Note that
+ if the new cluser uses different port number from old one,
+ <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+ command must be also executed on subscriber.
+ </para>

3a.
BEFORE
After the upgrade is complete, then re-enable the subscription.

SUGGESTION
Re-enable the subscription after the upgrade.

~

3b.
/cluser/cluster/

~

3c.
Note that
+ if the new cluser uses different port number from old one,
+ <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+ command must be also executed on subscriber.

SUGGESTION
Note that if the new cluster uses a different port number ALTER SUBSCRIPTION ... CONNECTION command must be also executed on the subscriber.

~~~

4.
+ <listitem>
+ <para>
+ <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+ of all slots on old cluster must be same as latest checkpoint location.
+ </para>
+ </listitem>

4a.

/on old cluster/on the old cluster/

~

4b.

/as latest/as the latest/
~~

5.
+ <listitem>
+ <para>
+ The output plugins referenced by the slots on the old cluster must be
+ installed on the new PostgreSQL executable directory.
+ </para>
+ </listitem>

/installed on/installed in/ ??

~~

6.
+ <listitem>
+ <para>
+ The new cluster must have
+ <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+ configured to value larger than the existing slots on the old cluster.
+ </para>
+ </listitem>

BEFORE
...to value larger than the existing slots on the old cluster.

SUGGESTION
...to a value greater than or equal to the number of slots present on the old cluster.

======
src/bin/pg_upgrade/check.c

7. GENERAL - check_for_logical_replication_slots

AFAICT this function is called *only* for the new_cluster, yet there is no Assert and no checking inside this function to ensure that is the case or not. It seems strange that the *cluster is passed as an argument but then the whole function body and messages assume it can only be a new cluster anyway.

IMO it would be better to rename this function to something like check_new_cluster_logical_replication_slots() and DO NOT pass any parameter but just use the global new_cluster within the function body.

~~~

8. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;

Start comment with uppercase for consistency.

~~~

9. check_for_logical_replication_slots

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but found \"%s\"",
+ PQgetvalue(res, 0, 0));

/replication slot/replication slots/

~

10. check_for_logical_replication_slots

+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

SUGGESTION
Do additional checks when there are logical replication slots on the old cluster.

~~~

11.
+ if (nslots > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than or equal to existing logical "
+ "replication slots on old cluster.");

11a.
SUGGESTION
max_replication_slots (%d) must be greater than or equal to the number of logical replication slots (%d) on the old cluster.

~

11b.
I think it would be helpful for the current values to be displayed in the fatal message so the user will know more about what value to set. Notice that my above suggestion has some substitution markers.

======
src/bin/pg_upgrade/info.c

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+ slot_info->slotname,
+ slot_info->plugin,
+ slot_info->two_phase);
+ }
+}

Better to have a blank line after the 'slot_info' declaration.

======
.../pg_upgrade/t/003_logical_replication_slots.pl

13.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Create a slot on old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;

13a.
It would be nicer if all the test parts have identical formats. So here it should also say

# Preparations for the subsequent test:
# 1. Create a slot on the old cluster

~

13b.
Notice the colon (:) at the end of that comment "Preparations for the subsequent test:". All the other preparation comments in this file should also have a colon.

~

14.
+# Cause a failure at the start of pg_upgrade because wal_level is replica

SUGGESTION
# pg_upgrade will fail because the new cluster wal_level is 'replica'

~~~

15.
+# 1. Create an unnecessary slot on the old cluster

(but it is not unnecessary -- it is necessary for this test!)

SUGGESTION
+# 1. Create a second slot on the old cluster

~~~

16.
+# Cause a failure at the start of pg_upgrade because the new cluster has
+# insufficient max_replication_slots

SUGGESTION
# pg_upgrade will fail because the new cluster has insufficient max_replication_slots

~~~

17.
+# Preparations for the subsequent test.
+# 1. Remove an unnecessary slot

SUGGESTION
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old cluster, so the new cluster config max_replication_slots=1 will now be enough.

~~~

18.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();

Maybe should be some added comments like:
# Check that the slot 'test_slot1' has migrated to the new cluster.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

22 августа 2023 г., 09:48:15

On Mon, Aug 21, 2023 at 6:32 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > 2.
> > + /*
> > + * Checking for logical slots must be done before
> > + * check_new_cluster_is_empty() because the slot_arr attribute of the
> > + * new_cluster will be checked in that function.
> > + */
> > + if (count_logical_slots(&old_cluster))
> > + {
> > + get_logical_slot_infos(&new_cluster, false);
> > + check_for_logical_replication_slots(&new_cluster);
> > + }
> > +
> >   check_new_cluster_is_empty();
> >
> > Can't we simplify this checking by simply querying
> > pg_replication_slots for any usable slot something similar to what we
> > are doing in check_for_prepared_transactions()? We can add this check
> > in the function check_for_logical_replication_slots().
>
> Some checks were included to check_for_logical_replication_slots(), and
> get_logical_slot_infos() for new_cluster was removed as you said.
>

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+   "FROM pg_catalog.pg_replication_slots "
+   "WHERE slot_type = 'logical' AND "
+   "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but
found \"%s\"",
+ PQgetvalue(res, 0, 0));
+
+ PQclear(res);
+
+ nslots = count_logical_slots(&old_cluster);
+
+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

Shouldn't these checks be reversed? I mean it would be better to test
the presence of slots on the new cluster if there is any slot present
on the old cluster.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

22 августа 2023 г., 11:52:45

Hi Kuroda-san.

I already posted a review for v22-0003 earlier today, but v23-0003 was already posted so those are not yet addressed.

Here are a few more review comments I noticed when looking at the latest v23-0003.

======
src/bin/pg_upgrade/check.c

1.
+#include "access/xlogdefs.h"
#include "catalog/pg_authid_d.h"

Was this #include needed here? I noticed you've already included the same in the "pg_upgrade.h".

~~~

2. check_for_lost_slots

+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

I can't quite describe my doubts about this, but something seems a bit strange. Didn't we already iterate every single slot in all DBs in the earlier function get_logical_slot_infos_per_db()? There we were only looking for wal_status <> 'lost', but we could have got *every* wal_status and also detected these 'lost' ones at the same time up-front, instead of having this extra function with more SQL to do pretty much the same SELECT.

Perhaps coding the current way there is a clear separation of the fetching code and the checking code, and that might be the best approach, but it somehow seems a shame/waste to be executing almost the same slots data with the same SQL 2x, so I wondered if there is a better way to arrange this.

======
src/bin/pg_upgrade/info.c

3. get_logical_slot_infos

+
+ /* Do additional checks if slots are found */
+ if (slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

Aren't these checks only intended for checking the 'old_cluster'? But AFAICT they are not guarded here so they will be executed by both sides. Previously (in my review of v22-0003) I suggested these calls maybe belonged in the calling function check_and_dump_old_cluster(). I think that.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 августа 2023 г., 05:43:32

Dear Peter,

Thanks for giving comments! New version will be available
in the upcoming post.

>
1. check_for_lost_slots

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

1a
Maybe the comment should start uppercase for consistency with others.
>

Seems right, but I revisit check_and_dump_old_cluster() again and found that
some version-specific checks are done outside the checking function.
So I started to follow the style and then the part is moved to
check_and_dump_old_cluster(). Also, version checking for new cluster is also
moved to check_new_cluster(). Is it OK for you?

>
1b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the comment.
>

Per suggestion from Amit, I used < 1700. Some other changes in 0002 were reverted.

>
2. check_for_lost_slots
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+   PQgetvalue(res, i, i_slotname));
+ }
+
+

The braces {} are not needed anymore
>

Fixed. 

>
3. check_for_confirmed_flush_lsn

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

3a.
Maybe the comment should start uppercase for consistency with others.
>

Per reply for comment 1, the part was no longer needed.

>
3b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with the comment.
>

Per suggestion from Amit, I used < 1700.

>
4. check_for_confirmed_flush_lsn
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
+ }
+

The braces {} are not needed anymore
>

Fixed.

>
5. get_control_data
+ /*
+ * Gather latest checkpoint location if the cluster is newer or
+ * equal to 17. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 17)

5a.
/newer or equal to 17/PG17 or later/
>

Fixed.

>
5b.
>= 17 should be >= 1700
>

Per suggestion from Amit, I used < 1700.

>
6. get_control_data
+ {
+ char *slash = NULL;
+ uint64 upper_lsn, lower_lsn;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ /*
+ * Upper and lower part of LSN must be read separately
+ * because it is reported as %X/%X format.
+ */
+ upper_lsn = strtoul(p, &slash, 16);
+ lower_lsn = strtoul(++slash, NULL, 16);
+
+ /* And combine them */
+ cluster->controldata.chkpnt_latest =
+ (upper_lsn << 32) | lower_lsn;
+ }

Should 'upper_lsn' and 'lower_lsn' be declared as uint32? That seems a better mirror for LSN_FORMAT_ARGS.
>

Changed the definition to uint32, and a cast was added.

>
7. get_logical_slot_infos
+
+ /*
+ * Do additional checks if slots are found on the old node. If something is
+ * found on the new node, a subsequent function
+ * check_new_cluster_is_empty() would report the name of slots and raise a
+ * fatal error.
+ */
+ if (cluster == &old_cluster && slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

It somehow doesn't feel right for these extra checks to be jammed into this function, just because you conveniently
havethe slot_count available.
 

On the NEW cluster side, there was extra checking in the check_new_cluster() function.

For consistency, I think this OLD cluster checking should be done in the check_and_dump_old_cluster() function -- see
the"Check for various failure cases" comment -- IMO this new fragment belongs there with the other checks.
 
>

All the checks were moved to check_and_dump_old_cluster(), and adds a check for its major version.

>
8.
  bool date_is_int;
  bool float8_pass_by_value;
  uint32 data_checksum_version;
+
+ XLogRecPtr chkpnt_latest;
 } ControlData;

I don't think the new field is particularly different from all the others that it needs a blank line separator.
 >

I removed the blank. Actually I wondered where the attribute should be, but kept at last.

>
9.
 # Initialize old node
 my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
 $old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;
 
 # Initialize new node
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');
 
-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');
 
-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');

~

Are those removal of the old_publisher start/stop changes that actually should be done in the 0002 patch?
>

Yes, It should be removed from 0002.

>
10.
 $old_publisher->safe_psql(
  'postgres', qq[
  SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);
 ]);
 
~

What is the purpose of the added SELECT? It doesn't seem covered by the comment.
>

The SELECT statement is needed to trigger the failure caused by the insufficient
max_replication_slots. Checking on new cluster is started after old servers are
verified, so if the step is omitted, another error is reported:

```
Checking confirmed_flush_lsn for logical replication slots  
WARNING: logical replication slot "test_slot1" has not consumed WALs yet

One or more logical replication slots still have unconsumed WAL records.
```

I added a comment about it.

>
11.
# Remove an unnecessary slot and generate WALs. These records would not be
# consumed before doing pg_upgrade, so that the upcoming test would fail.
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_drop_replication_slot('test_slot2');
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
]);
$old_publisher->stop;

Minor rewording of comment sentence.

SUGGESTION
Because these WAL records do not get consumed it will cause the upcoming pg_upgrade test to fail.
>

Added.


>
12.
# Cause a failure at the start of pg_upgrade because the slot still have
# unconsumed WAL records

~

/still have/still has/
>

Fixed.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 августа 2023 г., 05:44:44

Dear Peter,

> Here are some review comments for v23-0001

Thanks for the comment! But I did not update 0001 patch in this thread.
It will be managed in the forked one...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 августа 2023 г., 05:45:40

Dear Amit,

Thanks for the comment! Next version will be available in upcoming post.

> > > + /* logical replication slots can be migrated since PG17. */
> > > + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> > > + return;
> > >
> > > IMO the code matches the comment better if you say < 1700 instead of <=
> 1600.
> >
> > Changed.
> >
> 
> I think it is better to be consistent with the existing code. There
> are a few other checks in pg_upgrade.c that uses <=, so it is better
> to use it in the same way here.

OK, reverted.

> Another minor comment:
> Note that
> +     if the new cluser uses different port number from old one,
> +     <link linkend="sql-altersubscription"><command>ALTER
> SUBSCRIPTION ... CONNECTION</command></link>
> +     command must be also executed on subscriber.
> 
> I think this is true in general as well and not specific to
> pg_upgrade. So, we can avoid adding anything about connection change
> here.

Removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 августа 2023 г., 05:58:23

Dear Amit,

Thanks for giving comment. New version will be available in the upcoming post.

> + res = executeQueryOrDie(conn, "SELECT slot_name "
> +   "FROM pg_catalog.pg_replication_slots "
> +   "WHERE slot_type = 'logical' AND "
> +   "temporary IS FALSE;");
> +
> + if (PQntuples(res))
> + pg_fatal("New cluster must not have logical replication slot, but
> found \"%s\"",
> + PQgetvalue(res, 0, 0));
> +
> + PQclear(res);
> +
> + nslots = count_logical_slots(&old_cluster);
> +
> + /*
> + * Do additional checks when the logical replication slots have on the old
> + * cluster.
> + */
> + if (nslots)
>
> Shouldn't these checks be reversed? I mean it would be better to test
> the presence of slots on the new cluster if there is any slot present
> on the old cluster.

Hmm, I think the later part is meaningful only when the old cluster has logical
slots. To sum up, any checking should be done when the
count_logical_slots(&old_cluster) > 0, right? Fixed like that.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 августа 2023 г., 05:59:00

Dear Peter,

Thanks for giving comments! PSA the new version.

>
======
1. GENERAL

Please try to run a spell/grammar check on all the text like commit message and docs changes before posting (e.g.
cut/pastethe rendered text into some tool like MSWord or Grammarly or ChatGPT or whatever tool you like and
cross-check).There are lots of small typos etc but one up-front check could avoid long cycles of
reviewing/reporting/fixing/re-posting/confirming...
>

I checked all of sentences for Grammarly. Sorry for poor English.

>
======
Commit message

2.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to ths restriction, the timing of restoring replication slots is
different from other objects.

~

/ths/this/
>

Fixed.

>
doc/src/sgml/ref/pgupgrade.sgml

3.
+    <para>
+     Before you start upgrading the publisher cluster, ensure that the
+     subscription is temporarily disabled, by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... DISABLE</command></link>.
+     After the upgrade is complete, then re-enable the subscription. Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.
+    </para>

3a.
BEFORE
After the upgrade is complete, then re-enable the subscription.

SUGGESTION
Re-enable the subscription after the upgrade.
>

Fixed.


>
3b.
/cluser/cluster/

~

3c.
Note that
+     if the new cluser uses different port number from old one,
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... CONNECTION</command></link>
+     command must be also executed on subscriber.

SUGGESTION
Note that if the new cluster uses a different port number ALTER SUBSCRIPTION ... CONNECTION command must be also
executedon the subscriber. 
>

The part was removed.

>
4.
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+       of all slots on old cluster must be same as latest checkpoint location.
+      </para>
+     </listitem>

4a.
/on old cluster/on the old cluster/
>

Fixed.

>
4b.
/as latest/as the latest/
>
Fixed.

>
5.
+     <listitem>
+      <para>
+       The output plugins referenced by the slots on the old cluster must be
+       installed on the new PostgreSQL executable directory.
+      </para>
+     </listitem>

/installed on/installed in/ ??
>

"installed in" is better, fixed.

>
6.
+     <listitem>
+      <para>
+       The new cluster must have
+       <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
+       configured to value larger than the existing slots on the old cluster.
+      </para>
+     </listitem>

BEFORE
...to value larger than the existing slots on the old cluster.

SUGGESTION
...to a value greater than or equal to the number of slots present on the old cluster.
>

Fixed.

>
src/bin/pg_upgrade/check.c

7. GENERAL - check_for_logical_replication_slots

AFAICT this function is called *only* for the new_cluster, yet there is no Assert and no checking inside this function
toensure that is the case or not.  It seems strange that the *cluster is passed as an argument but then the whole
functionbody and messages assume it can only be a new cluster anyway. 

IMO it would be better to rename this function to something like check_new_cluster_logical_replication_slots() and DO
NOTpass any parameter but just use the global new_cluster within the function body. 
>

Hmm, I followed other functions, e.g., check_for_composite_data_type_usage() is
called only for old one but it has an argument *cluster. What is the difference
between them? Moreover, how about check_for_lost_slots() and
check_for_confirmed_flush_lsn()? Fixed for the moment.

>
8. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;

Start comment with uppercase for consistency.
>

The part was removed.

>
9. check_for_logical_replication_slots

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+  "FROM pg_catalog.pg_replication_slots "
+  "WHERE slot_type = 'logical' AND "
+  "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but found \"%s\"",
+ PQgetvalue(res, 0, 0));

/replication slot/replication slots/
>

Fixed.

>
10. check_for_logical_replication_slots

+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

SUGGESTION
Do additional checks when there are logical replication slots on the old cluster.
>

Per suggestion from Amit, the part was removed.

>
11.
+ if (nslots > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than or equal to existing logical "
+ "replication slots on old cluster.");

11a.
SUGGESTION
max_replication_slots (%d) must be greater than or equal to the number of logical replication slots (%d) on the old
cluster.

11b.
I think it would be helpful for the current values to be displayed in the fatal message so the user will know more
aboutwhat value to set. Notice that my above suggestion has some substitution markers.  
>

Changed.

>
src/bin/pg_upgrade/info.c

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+   slot_info->slotname,
+   slot_info->plugin,
+   slot_info->two_phase);
+ }
+}

Better to have a blank line after the 'slot_info' declaration.
>

Added.

>
.../pg_upgrade/t/http://003_logical_replication_slots.pl

13.
+# ------------------------------
+# TEST: Confirm pg_upgrade fails when new cluster wal_level is not 'logical'
+
+# Create a slot on old cluster
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);"
+);
+$old_publisher->stop;

13a.
It would be nicer if all the test parts have identical formats. So here it should also say

# Preparations for the subsequent test:
# 1. Create a slot on the old cluster
>

I did not use because there was only one step, but followed the style.

>
13b.
Notice the colon (:) at the end of that comment "Preparations for the subsequent test:". All the other preparation
commentsin this file should also have a colon. 
>

Added.

>
14.
+# Cause a failure at the start of pg_upgrade because wal_level is replica

SUGGESTION
# pg_upgrade will fail because the new cluster wal_level is 'replica'
>

Fixed.

>
15.
+# 1. Create an unnecessary slot on the old cluster

(but it is not unnecessary -- it is necessary for this test!)

SUGGESTION
+# 1. Create a second slot on the old cluster
>

Fixed.

>
16.
+# Cause a failure at the start of pg_upgrade because the new cluster has
+# insufficient max_replication_slots

SUGGESTION
# pg_upgrade will fail because the new cluster has insufficient max_replication_slots
>

Fixed.

>
17.
+# Preparations for the subsequent test.
+# 1. Remove an unnecessary slot

SUGGESTION
+# 1. Remove the slot 'test_slot2', leaving only 1 slot remaining on the old cluster, so the new cluster config
max_replication_slots=1will now be enough. 
>

Fixed.

>
18.
+$new_publisher->start;
+my $result = $new_publisher->safe_psql('postgres',
+ "SELECT slot_name, two_phase FROM pg_replication_slots");
+is($result, qq(test_slot1|t), 'check the slot exists on new cluster');
+$new_publisher->stop;
+
+done_testing();

Maybe should be some added comments like:
# Check that the slot 'test_slot1' has migrated to the new cluster.
>

Added.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 августа 2023 г., 06:00:22

Dear Peter,

Thanks for giving comments! New version can be available in [1].

>
1.
+#include "access/xlogdefs.h"
 #include "catalog/pg_authid_d.h"
 
Was this #include needed here? I noticed you've already included the same in the "pg_upgrade.h".
>

It was needed because the macro LSN_FORMAT_ARGS() was used in the file.
I preferred all the needed file are included even if it has already been done in header, so 
#include was written here.

>
2. check_for_lost_slots

+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

I can't quite describe my doubts about this, but something seems a bit strange. Didn't we already iterate every single
slotin all DBs in the earlier function get_logical_slot_infos_per_db()? There we were only looking for wal_status <>
'lost',but we could have got *every* wal_status and also detected these 'lost' ones at the same time up-front, instead
ofhaving this extra function with more SQL to do pretty much the same SELECT.
 

Perhaps coding the current way there is a clear separation of the fetching code and the checking code, and that might
bethe best approach, but it somehow seems a shame/waste to be executing almost the same slots data with the same SQL
2x,so I wondered if there is a better way to arrange this.
 
 >

Hmm, but you did not like to do additional checks in the get_logical_slot_infos(),
right? They cannot go together. In case of check_new_cluster(), information for
relations is extracted in get_db_and_rel_infos() and then checked whether it is
empty or not in check_new_cluster_is_empty(). The phase is also separated.

>
src/bin/pg_upgrade/info.c

3. get_logical_slot_infos

+
+ /* Do additional checks if slots are found */
+ if (slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

Aren't these checks only intended for checking the 'old_cluster'? But AFAICT they are not guarded here so they will be
executedby both sides. Previously (in my review of v22-0003) I suggested these calls maybe belonged in the calling
functioncheck_and_dump_old_cluster(). I think that.
 
>

Moved to check_and_dump_old_cluster().

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866DD3348B5224E0A1BFC3EF51CA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

24 августа 2023 г., 05:24:48

Thanks for the updated patches.

Here are some review comments for the patch v24-0002

======
doc/src/sgml/ref/pgupgrade.sgml

1.
+ <listitem>
+ <para>
+ All slots on the old cluster must be usable, i.e., there are no slots
+ whose <structfield>wal_status</structfield> is <literal>lost</literal> (see
+ <xref linkend="view-pg-replication-slots"/>).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+ of all slots on the old cluster must be the same as the latest
+ checkpoint location.
+ </para>
+ </listitem>

It might be more tidy to change the way those links (e.g. "See section 54.19") are presented:

1a.
SUGGESTION
All slots on the old cluster must be usable, i.e., there are no slots whose <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield> is <literal>lost</literal>.

~

1b.
SUGGESTION
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield> of all slots on the old cluster must be the same as the latest checkpoint location.

======
src/bin/pg_upgrade/check.c

2.
+ /* Logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster.major_version) >= 1700)
+ check_new_cluster_logical_replication_slots();
+

Does it even make sense to check the new_cluster version? IIUC pg_upgrade *always* updates to the current PG version, which must be 1700 by definition, because this only is a PG17 patch, right?

For example, see check_cluster_versions() function where it does this check:

/* Only current PG version is supported as a target */
if (GET_MAJOR_VERSION(new_cluster.major_version) != GET_MAJOR_VERSION(PG_VERSION_NUM))
pg_fatal("This utility can only upgrade to PostgreSQL version %s.",
PG_MAJORVERSION);

======
src/bin/pg_upgrade/function.c

3.
os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
totaltups = 0;

for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
{
PGresult *res = ress[dbnum];
int ntups;
int rowno;

ntups = PQntuples(res);
for (rowno = 0; rowno < ntups; rowno++)
{
char *lib = PQgetvalue(res, rowno, 0);

os_info.libraries[totaltups].name = pg_strdup(lib);
os_info.libraries[totaltups].dbnum = dbnum;

totaltups++;
}
PQclear(res);
}

~

Although this was not introduced by your patch, I do not understand why the 'totaltups' variable gets reset to zero and then re-incremented in these loops.

In other words, how is it possible for the end result of 'totaltups' to be any different from what was already calculated earlier in this function?

IMO totaltups = 0; and totaltups++; is just redundant code.

======
src/bin/pg_upgrade/info.c

4. get_logical_slot_infos

+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

It is no longer clear to me what is the purpose of these version checks.

As mentioned in comment #2 above, I don't think we need to check the new_cluster >= 1700, because this patch is for PG17 by definition.

OTOH, I also don't recognise the reason why there has to be a PG17 restriction on the 'old_cluster' version. Such a restriction seems to cripple the usefulness of this patch (eg. cannot even upgrade slots from PG16 to PG17), and there is no explanation given for it. If there is some valid incompatibility reason why only PG17 old_cluster slots can be upgraded then it ought to be described in detail and probably also mentioned in the PG DOCS.

~~~

5. count_logical_slots

+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ /* Quick exit if the version is prior to PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slot_count += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

Same as the previous comment #4. I had doubts about the intent/need for this cluster version checking.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

24 августа 2023 г., 05:51:17

On Thu, Aug 24, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> ======
> src/bin/pg_upgrade/info.c
>
> 4. get_logical_slot_infos
>
> +/*
> + * get_logical_slot_infos()
> + *
> + * Higher level routine to generate LogicalSlotInfoArr for all databases.
> + */
> +void
> +get_logical_slot_infos(ClusterInfo *cluster)
> +{
> + int dbnum;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
> + return;
>
> It is no longer clear to me what is the purpose of these version checks.
>
> As mentioned in comment #2 above, I don't think we need to check the new_cluster >= 1700, because this patch is for
PG17by definition. 
>
> OTOH, I also don't recognise the reason why there has to be a PG17 restriction on the 'old_cluster' version. Such a
restrictionseems to cripple the usefulness of this patch (eg. cannot even upgrade slots from PG16 to PG17), and there
isno explanation given for it. If there is some valid incompatibility reason why only PG17 old_cluster slots can be
upgradedthen it ought to be described in detail and probably also mentioned in the PG DOCS. 
>

One of the main reasons is that slots prior to v17 won't persist
confirm_flush_lsn as discussed in the email thread [1] which means it
will always fail even if we allow to upgrade from versions prior to
v17. Now, there is an argument that let's backpatch what's being
discussed in [1] and then we will be able to upgrade slots from the
prior version. Normally, we don't backatch new enhancements, so even
if we want to do that in this case, a separate argument has to be made
for it. We have already discussed this point in this thread. We can
probably add a comment in the patch where we do version checks so that
it will be a bit easier to understand the reason.

[1] - https://www.postgresql.org/message-id/CAA4eK1JzJagMmb_E8D4au%3DGYQkxox0AfNBm1FbP7sy7t4YWXPQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

24 августа 2023 г., 06:24:04

Hi Kuroda-san

FYI, the v24-0003 tests for pg_upgrade did not work for me:

~~~

# +++ tap check in src/bin/pg_upgrade +++

t/001_basic.pl ...................... ok

t/002_pg_upgrade.pl ................. ok

t/003_logical_replication_slots.pl .. 7/?

#   Failed test 'run of pg_upgrade of old cluster'

#   at t/003_logical_replication_slots.pl line 174.



#   Failed test 'pg_upgrade_output.d/ removed after pg_upgrade success'

#   at t/003_logical_replication_slots.pl line 187.



#   Failed test 'check the slot exists on new cluster'

#   at t/003_logical_replication_slots.pl line 194.

#          got: ''

#     expected: 'sub|t'

# Tests were run but no plan was declared and done_testing() was not seen.

t/003_logical_replication_slots.pl .. Dubious, test returned 29 (wstat
7424, 0x1d00)

Failed 3/9 subtests



Test Summary Report

-------------------

t/003_logical_replication_slots.pl (Wstat: 7424 Tests: 9 Failed: 3)

  Failed tests:  7-9

  Non-zero exit status: 29

  Parse errors: No plan found in TAP output

Files=3, Tests=35, 116 wallclock secs ( 0.06 usr  0.01 sys + 18.02
cusr  6.40 csys = 24.49 CPU)

Result: FAIL

make: *** [check] Error 1

~~~

I can provide the log files with more details about the errors if you
cannot reproduce this

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

24 августа 2023 г., 12:50:39

Notwithstanding the test errors I am getting for v24-0003, here are
some code review comments for this patch anyway.

======
src/bin/pg_upgrade/check.c

1. check_for_lost_slots

+
+/*
+ * Verify that all logical replication slots are usable.
+ */
+void
+check_for_lost_slots(ClusterInfo *cluster)

1a.
AFAIK we don't ever need to call this also for 'new_cluster'. So the
function should have no parameter and just access 'old_cluster'
directly.

~

1b.
Can't this be a static function now?

~

2.
+ for (i = 0; i < ntups; i++)
+ pg_log(PG_WARNING,
+    "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+    PQgetvalue(res, i, i_slotname));

Is it correct that this message also includes the word "WARNING"?
Other PG_WARNING messages don't do that.

~~~

3. check_for_confirmed_flush_lsn

+/*
+ * Verify that all logical replication slots consumed all WALs, except a
+ * CHECKPOINT_SHUTDOWN record.
+ */
+static void
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)

AFAIK we don't ever need to call this also for 'new_cluster'. So the
function should have no parameter and just access 'old_cluster'
directly.

~

4.
+ for (i = 0; i < ntups; i++)
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));

Is it correct that this message also includes the word "WARNING"?
Other PG_WARNING messages don't do that.

======
src/bin/pg_upgrade/controldata.c

5. get_control_data

+ else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL)
+ {
+ /*
+ * Gather the latest checkpoint location if the cluster is PG17
+ * or later. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)

But we are not "gathering" anything. It's just one LSN. I think this
ought to just say "Read the latest..."

~

6.
+ /*
+ * The upper and lower part of LSN must be read separately
+ * because it is reported in %X/%X format.
+ */

/reported/stored as/

======
src/bin/pg_upgrade/pg_upgrade.h

7.
+void check_for_lost_slots(ClusterInfo *cluster);\

Why is this needed here? Can't this be a static function?

======
.../t/003_logical_replication_slots.pl

8.
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+# tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);

I wondered if that step really needed. Why will there be WAL records to consume?

IIUC we haven't published anything yet.

~~~

9.
+# ------------------------------
+# TEST: Successful upgrade
+
+# Preparations for the subsequent test:
+# 1. Remove the remained slot
+$old_publisher->start;
+$old_publisher->safe_psql('postgres',
+ "SELECT * FROM pg_drop_replication_slot('test_slot1');"
+);

Should removal of the slot be done as part of the cleanup of the
previous test, instead of preparing for this one?

~~~

10.
# 3. Disable the subscription once
$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE");
$old_publisher->stop;

10a.
What do you mean by "once"?

~

10b.
That old_publisher->stop; seems strangely placed. Why is it here?

~~~

11.
# Check that the slot 'test_slot1' has migrated to the new cluster
$new_publisher->start;
my $result = $new_publisher->safe_psql('postgres',
"SELECT slot_name, two_phase FROM pg_replication_slots");
is($result, qq(sub|t), 'check the slot exists on new cluster');

~

That comment now seems wrong. That slot was previously removed, right?

~~~


12.
# Update the connection
my $new_connstr = $new_publisher->connstr . ' dbname=postgres';
$subscriber->safe_psql('postgres',
"ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'");
$subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE");

~

Maybe better to combine both SQL.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

25 августа 2023 г., 05:09:38

Dear Peter,

> FYI, the v24-0003 tests for pg_upgrade did not work for me:

Hmm, I ran tests more than 1hr but could not reproduce the failure.
cfbot also said OK multiple times...

Could you please check source codes again and send log files
if it is still problem?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

25 августа 2023 г., 05:10:12

Dear Peter,

Thanks for reviewing! PSA new version patch set.
Note again that 0001 patch was replaced to new one[1], but you do not have to
discuss that - it should be done in forked thread.

>
1.
+     <listitem>
+      <para>
+       All slots on the old cluster must be usable, i.e., there are no slots
+       whose <structfield>wal_status</structfield> is <literal>lost</literal> (see
+       <xref linkend="view-pg-replication-slots"/>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       <structfield>confirmed_flush_lsn</structfield> (see <xref linkend="view-pg-replication-slots"/>)
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.
+      </para>
+     </listitem>

It might be more tidy to change the way those links (e.g. "See section 54.19") are presented:

1a.
SUGGESTION
All slots on the old cluster must be usable, i.e., there are no slots whose <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield>is
<literal>lost</literal>.
>

Fixed.

>
1b.
SUGGESTION
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield> of
allslots on the old cluster must be the same as the latest checkpoint location.
 
>

Fixed.

>
2.
+ /* Logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster.major_version) >= 1700)
+ check_new_cluster_logical_replication_slots();
+

Does it even make sense to check the new_cluster version? IIUC pg_upgrade *always* updates to the current PG version,
whichmust be 1700 by definition, because this only is a PG17 patch, right?
 

For example, see check_cluster_versions() function where it does this check:

/* Only current PG version is supported as a target */
if (GET_MAJOR_VERSION(new_cluster.major_version) != GET_MAJOR_VERSION(PG_VERSION_NUM))
pg_fatal("This utility can only upgrade to PostgreSQL version %s.",
PG_MAJORVERSION);
>

You are right, the new_cluster always has the same version as pg_upgrade.
Removed.

>
os_info.libraries = (LibraryInfo *) pg_malloc(totaltups * sizeof(LibraryInfo));
totaltups = 0;

for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
{
PGresult   *res = ress[dbnum];
int ntups;
int rowno;

ntups = PQntuples(res);
for (rowno = 0; rowno < ntups; rowno++)
{
char   *lib = PQgetvalue(res, rowno, 0);

os_info.libraries[totaltups].name = pg_strdup(lib);
os_info.libraries[totaltups].dbnum = dbnum;

totaltups++;
}
PQclear(res);
}

~

Although this was not introduced by your patch, I do not understand why the 'totaltups' variable gets reset to zero and
thenre-incremented in these loops. 
 

In other words, how is it possible for the end result of 'totaltups' to be any different from what was already
calculatedearlier in this function? 
 

IMO totaltups = 0; and totaltups++; is just redundant code.
>

First of all, I will not fix that in this thread, it should be done in another
place. I do not want to expand the thread anymore. Personally, it seemed that
totaltups was just reused as index for the array.


>
4. get_logical_slot_infos

+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

It is no longer clear to me what is the purpose of these version checks.

As mentioned in comment #2 above, I don't think we need to check the new_cluster >= 1700, because this patch is for
PG17by definition.
 

OTOH, I also don't recognise the reason why there has to be a PG17 restriction on the 'old_cluster' version. Such a
restrictionseems to cripple the usefulness of this patch (eg. cannot even upgrade slots from PG16 to PG17), and there
isno explanation given for it. If there is some valid incompatibility reason why only PG17 old_cluster slots can be
upgradedthen it ought to be described in detail and probably also mentioned in the PG DOCS. 
 
>

Upgrading logical slots with verifications requires that they surely saved to
disk while shutting down (0001 patch). Currently we do not have a plan to
backpatch it, so I think the checking must be needed. Instead, I added
descriptions in the doc and code comments.

>
5. count_logical_slots

+/*
+ * count_logical_slots()
+ *
+ * Sum up and return the number of logical replication slots for all databases.
+ */
+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ /* Quick exit if the version is prior to PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slot_count += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

Same as the previous comment #4. I had doubts about the intent/need for this cluster version checking.
>

As I said above, this is needed.

[1]: https://www.postgresql.org/message-id/CALDaNm0VrAt24e2FxbOX6eJQ-G_tZ0gVpsFBjzQM99NxG0hZfg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

Thank you for reviewing! PSA new version patch set.
 
> ======
> Commit message.
> 
> 1.
> I felt this should mention the limitation that the slot upgrade
> feature is only supported from PG17 slots upwards.

Added. The same sentence as doc was used.

> doc/src/sgml/ref/pgupgrade.sgml
> 
> 2.
> +    <para>
> +     <application>pg_upgrade</application> attempts to migrate logical
> +     replication slots. This helps avoid the need for manually defining the
> +     same replication slots on the new publisher. Currently,
> +     <application>pg_upgrade</application> supports migrate logical
> replication
> +     slots when the old cluster is 17.X and later.
> +    </para>
> 
> Currently, <application>pg_upgrade</application> supports migrate
> logical replication slots when the old cluster is 17.X and later.
> 
> SUGGESTION
> Migration of logical replication slots is only supported when the old
> cluster is version 17.0 or later.

Fixed.

> src/bin/pg_upgrade/check.c
> 
> 3. GENERAL
> 
> IMO all version checking for this feature should only be done within
> this "check.c" file as much as possible.
> 
> The detailed reason for this PG17 limitation can be in the file header
> comment of "pg_upgrade.c", and then all the version checks can simply
> say something like:
> "Logical slot migration is only support for slots in PostgreSQL 17.0
> and later. See atop file pg_upgrade.c for an explanation of this
> limitation "

Hmm, I'm not sure it should be and Amit disagreed [1].
I did not address this one.

> 4. check_and_dump_old_cluster
> 
> + /* Extract a list of logical replication slots */
> + get_logical_slot_infos();
> +
> 
> IMO the version checking should only be done in the "checking"
> functions, so it should be removed from the within
> get_logical_slot_infos() and put here in the caller.
> 
> SUGGESTION
> 
> /* Logical slots can be migrated since PG17. */
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> {
> /* Extract a list of logical replication slots */
> get_logical_slot_infos();
> }

Per discussion [1], I did not address the comment.

> 5. check_new_cluster_logical_replication_slots
> 
> +check_new_cluster_logical_replication_slots(void)
> +{
> + PGresult   *res;
> + PGconn    *conn;
> + int nslots = count_logical_slots();
> + int max_replication_slots;
> + char    *wal_level;
> +
> + /* Quick exit if there are no logical slots on the old cluster */
> + if (nslots == 0)
> + return;
> 
> IMO the version checking should only be done in the "checking"
> functions, so it should be removed from the count_logical_slots() and
> then this code should be written more like this:
> 
> SUGGESTION (notice the quick return comment change too)
> 
> int nslots = 0;
> 
> /* Logical slots can be migrated since PG17. */
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
>     nslots = count_logical_slots();
> 
> /* Quick return if there are no logical slots to be migrated. */
> if (nslots == 0)
>     return;

Fixed.

> src/bin/pg_upgrade/info.c
> 
> 6. GENERAL
> 
> For the sake of readability it might be better to make the function
> names more explicit:
> 
> get_logical_slot_infos() -> get_old_cluster_logical_slot_infos()
> count_logical_slots() -> count_old_cluster_logical_slots()

Fixed. Moreover, get_logical_slot_infos_per_db() also followed the style.

> 7. get_logical_slot_infos
> 
> +/*
> + * get_logical_slot_infos()
> + *
> + * Higher level routine to generate LogicalSlotInfoArr for all databases.
> + *
> + * Note: This function will not do anything if the old cluster is pre-PG 17.
> + * The logical slots are not saved at shutdown, and the confirmed_flush_lsn is
> + * always behind the SHUTDOWN_CHECKPOINT record. Subsequent checks
> done in
> + * check_for_confirmed_flush_lsn() would raise a FATAL error if such slots are
> + * included.
> + */
> +void
> +get_logical_slot_infos(void)
> 
> Move all this detailed explanation about the limitation to the
> file-level comment in "pg_upgrade.c". See also review comment #3.

Per discussion [1], I did not address the comment.

> 8. get_logical_slot_infos
> 
> +void
> +get_logical_slot_infos(void)
> +{
> + int dbnum;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
> 
> IMO the version checking is best done in the "checking" functions. See
> previous review comments about the caller of this. If you want to put
> something here, then just have an Assert:
> 
> Assert(GET_MAJOR_VERSION(old_cluster.major_version) >= 1700);

As I said above, check_and_dump_old_cluster() still does not check major version
before calling get_old_cluster_logical_slot_infos(). So I kept current style.

> 9. count_logical_slots
> 
> +/*
> + * count_logical_slots()
> + *
> + * Sum up and return the number of logical replication slots for all databases.
> + */
> +int
> +count_logical_slots(void)
> +{
> + int dbnum;
> + int slot_count = 0;
> +
> + /* Quick exit if the version is prior to PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return 0;
> +
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> +
> + return slot_count;
> +}
> 
> IMO it is better to remove the version-checking side-effect here. Do
> the version checks from the "check" functions where this is called
> from. Also removing the check from here gives the ability to output
> more useful messages -- e.g. review comment #11

Apart from this, count_old_cluster_logical_slots() are called after checking
major version. Assert() was added instead.

> src/bin/pg_upgrade/pg_upgrade.c
> 
> 10. File-level comment
> 
> Add a detailed explanation about the limitation in the file-level
> comment. See review comment #3 for details.

Per discussion [1], I did not address the comment.

> 11.
> + /*
> + * Create logical replication slots.
> + *
> + * Note: This must be done after doing the pg_resetwal command because
> + * pg_resetwal would remove required WALs.
> + */
> + if (count_logical_slots())
> + {
> + start_postmaster(&new_cluster, true);
> + create_logical_replication_slots();
> + stop_postmaster(false);
> + }
> +
> 
> IMO it is better to do the explicit version checking here, instead of
> relying on a side-effect within the count_logical_slots() function.
> 
> SUGGESTION #1
> 
> /* Logical replication slot upgrade only supported for old_cluster >= PG17 
*/
> if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> {
> if (count_logical_slots())
> {
> start_postmaster(&new_cluster, true);
> create_logical_replication_slots();
> stop_postmaster(false);
> }
> }
> 
> AND...
> 
> By doing this, you will be able to provide more useful output here like this:
> 
> SUGGESTION #2 (my preferred)
> 
> if (count_logical_slots())
> {
>     if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
>     {
>         pg_log(PG_WARNING,
>             "\nWARNING: This utility can only upgrade logical
> replication slots present in PostgreSQL version %s and later.",
>             "17.0");
>     }
>     else
>     {
>         start_postmaster(&new_cluster, true);
>         create_logical_replication_slots();
>         stop_postmaster(false);
>     }
> }
>

Per discussion [1], SUGGESTION #1 was chosen.

[1]: https://www.postgresql.org/message-id/CAA4eK1Jfk6eQSpasg+GoJVjtkQ3tFSihurbCFwnL3oV75BoUgQ@mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

Thank you for reviewing! PSA new version patch set.

> ======
> 1. About the PG17 limitation
>
> In my previous review of v25-0002, I suggested that the PG17
> limitation should be documented atop one of the source files. See
> [1]#3, [1]#7, [1]#10
>
> I just wanted to explain the reason for that suggestion.
>
> Currently, all the new version checks have a comment like "/* Logical
> slots can be migrated since PG17. */". I felt that it would be better
> if those comments said something more like "/* Logical slots can be
> migrated since PG17. See XYZ for details. */". I don't really care
> *where* the main explanation lives, but I thought since it is
> referenced from multiple places it might be easier to find if it was
> atop some file instead of just in a function comment. YMMV.
>
> ======
> 2. Do version checking in check_and_dump_old_cluster instead of inside
> get_old_cluster_logical_slot_infos
>
> check_and_dump_old_cluster - Should check version before calling
> get_old_cluster_logical_slot_infos
> get_old_cluster_logical_slot_infos - Keep a sanity check Assert if you
> wish (or do nothing -- e.g. see #3 below)
>
> Refer to [1]#4, [1]#8
>
> Isn't it self-evident from the file/function names what kind of logic
> they are intended to have in them? Sure, there may be some exceptions
> but unless it is difficult to implement I think most people would
> reasonably assume:
>
> - checking code should be in file "check.c"
> -- e.g. a function called 'check_and_dump_old_cluster' ought to be
> *checking* stuff
>
> - info fetching code should be in file "info.c"
>
> ~~
>
> Another motivation for this suggestion becomes more obvious later with
> patch 0003. By checking at the "higher" level (in check.c) it means
> multiple related functions can all be called under one version check.
> Less checking means less code and/or simpler code. For example,
> multiple redundant calls to get_old_cluster_count_slots() can be
> avoided in patch 0003 by writing *less* code, than v26* currently has.

IIUC these points were disagreed by Amit, so I would keep my code until he posts
opinions.

> 3. count_old_cluster_logical_slots
>
> I think there is nothing special in this logic that will crash if PG
> version <= 1600. Keep the Assert for sanity checking if you wish, but
> this is already guarded by the call in pg_upgrade.c so perhaps it is
> overkill.

Your point is right.
I have checked some version-specific functions like check_for_aclitem_data_type_usage()
and check_for_user_defined_encoding_conversions(), they do not have assert(). So
removed from it. As for free_db_and_rel_infos(), the Assert() ensures that new
cluster does not have logical slots, so I kept it.

Also, I found that get_loadable_libraries() always read pg_replication_slots,
even if the old cluster is older than PG17. This let additional checks for logical
decoding output plugins. Moreover, prior than PG12 could not be upgrade because
they do not have an attribute wal_status.

I think the checking should be done only when old_cluster is >= PG17, so fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Amit,

Thank you for giving comments! PSA new version. I ran the pgindent.

> > 1. check_and_dump_old_cluster
> >
> > CURRENT CODE (with v26-0003 patch applied)
> >
> > /* Extract a list of logical replication slots */
> > get_old_cluster_logical_slot_infos();
> >
> > ...
> >
> > /*
> > * Logical replication slots can be migrated since PG17. See comments atop
> > * get_old_cluster_logical_slot_infos().
> > */
> > if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> > {
> > check_old_cluster_for_lost_slots();
> >
> > /*
> > * Do additional checks if a live check is not required. This requires
> > * that confirmed_flush_lsn of all the slots is the same as the latest
> > * checkpoint location, but it would be satisfied only when the server
> > * has been shut down.
> > */
> > if (!live_check)
> > check_old_cluster_for_confirmed_flush_lsn();
> > }
> >
> >
> > SUGGESTION
> >
> > /*
> >  * Logical replication slots can be migrated since PG17. See comments atop
> >  * get_old_cluster_logical_slot_infos().
> >  */
> > if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) // NOTE 1a.
> > {
> >   /* Extract a list of logical replication slots */
> >   get_old_cluster_logical_slot_infos();
> >
> >   if (count_old_cluster_slots()) // NOTE 1b.
> >   {
> >     check_old_cluster_for_lost_slots();
> >
> >     /*
> >      * Do additional checks if a live check is not required. This requires
> >      * that confirmed_flush_lsn of all the slots is the same as the latest
> >      * checkpoint location, but it would be satisfied only when the server
> >      * has been shut down.
> >      */
> >     if (!live_check)
> >       check_old_cluster_for_confirmed_flush_lsn();
> >   }
> > }
> >
> 
> I think a slightly better way to achieve this is to combine the code
> from check_old_cluster_for_lost_slots() and
> check_old_cluster_for_confirmed_flush_lsn() into
> check_old_cluster_for_valid_slots(). That will even save us a new
> connection for the second check.

They are combined into one function.

> Also, I think we can simplify another check in the patch:
> @@ -1446,8 +1446,10 @@ check_new_cluster_logical_replication_slots(void)
>         char       *wal_level;
> 
>         /* Logical slots can be migrated since PG17. */
> -       if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> -               nslots = count_old_cluster_logical_slots();
> +       if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> +               return;
> +
> +       nslots = count_old_cluster_logical_slots();
>

Fixed.

Also, I have tested the combination of this patch and the physical standby.

1. Logical slots defined on old physical standby *cannot be upgraded*
2. Logical slots defined on physical primary *are migrated* to new physical standby

The primal reason is that pg_upgrade cannot be used for physical standby. If
users want to upgrade standby, rsync command is used instead. The command
creates the cluster based on the based on the new primary, hence they are
replicated to new standby. In contrast, the old cluster is basically ignored so
that slots on old cluster is not upgraded.  I updated the doc accordingly.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Wed, Aug 30, 2023 at 7:55 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some minor review comments for patch v28-0002
>
> ======
> src/sgml/ref/pgupgrade.sgml
>
> 1.
> -       with the primary.)  Replication slots are not copied and must
> -       be recreated.
> +       with the primary.)  Replication slots on old standby are not copied.
> +       Only logical slots on the primary are migrated to the new standby,
> +       and other slots must be recreated.
>        </para>
>
> /on old standby/on the old standby/
>

Fixed.

> ======
> src/bin/pg_upgrade/info.c
>
> 2. get_old_cluster_logical_slot_infos
>
> +void
> +get_old_cluster_logical_slot_infos(void)
> +{
> + int dbnum;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
> +
> + pg_log(PG_VERBOSE, "\nsource databases:");
> +
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + {
> + DbInfo    *pDbInfo = &old_cluster.dbarr.dbs[dbnum];
> +
> + get_old_cluster_logical_slot_infos_per_db(pDbInfo);
> +
> + if (log_opts.verbose)
> + {
> + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
> + print_slot_infos(&pDbInfo->slot_arr);
> + }
> + }
> +}
>
> It might be worth putting an Assert before calling the
> get_old_cluster_logical_slot_infos_per_db(...) just as a sanity check:
> Assert(pDbInfo->slot_arr.nslots == 0);
>
> This also helps to better document the "Note" of the
> count_old_cluster_logical_slots() function comment.
>

I have changed the comments atop count_old_cluster_logical_slots() and
also I don't see the need for this Assert.

> ~~~
>
> 3. count_old_cluster_logical_slots
>
> +/*
> + * count_old_cluster_logical_slots()
> + *
> + * Sum up and return the number of logical replication slots for all databases.
> + *
> + * Note: this function always returns 0 if the old_cluster is PG16 and prior
> + * because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and
> + * later.
> + */
> +int
> +count_old_cluster_logical_slots(void)
>
> Maybe that "Note" should be expanded a bit to say who does this:
>
> SUGGESTION
>
> Note: This function always returns 0 if the old_cluster is PG16 and
> prior because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for
> PG17 and later. See where get_old_cluster_logical_slot_infos_per_db()
> is called.
>

Changed, but written differently because saying in terms of variable
name doesn't sound good to me.

> ======
> src/bin/pg_upgrade/pg_upgrade.c
>
> 4.
> + /*
> + * Logical replication slot upgrade only supported for old_cluster >=
> + * PG17.
> + *
> + * Note: This must be done after doing the pg_resetwal command because
> + * pg_resetwal would remove required WALs.
> + */
> + if (count_old_cluster_logical_slots())
> + {
> + start_postmaster(&new_cluster, true);
> + create_logical_replication_slots();
> + stop_postmaster(false);
> + }
> +
>
> 4a.
> I felt this comment needs a bit more detail otherwise you can't tell
> how the >= PG17 version check works.
>
> 4b.
> /slot upgrade only supported/slot upgrade is only supported/
>
> ~
>
> SUGGESTION
>
> Logical replication slot upgrade is only supported for old_cluster >=
> PG17. An explicit version check is not necessary here because function
> count_old_cluster_logical_slots() will always return 0 for old_cluster
> <= PG16.
>

I don't see the need to explain anything about version check here, so
removed that part of the comment.

Apart from this, I have addressed some of the comments raised by you
for the 0003 patch. Please find the diff patch attached. I think we
should combine 0002 and 0003 patches.

I have another comment on the patch:
+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

In this place, shouldn't we explicitly check for slot_type as logical?
I think we should consistently check for slot_type in all the queries
used in this patch.

--
With Regards,
Amit Kapila.

Вложения

changes_amit.1.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

31 августа 2023 г., 13:47:41

On Wed, Aug 30, 2023 at 10:58 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some review comments for v28-0003.
>
> ======
> src/bin/pg_upgrade/check.c
>
> 1. check_and_dump_old_cluster
> + /*
> + * Logical replication slots can be migrated since PG17. See comments atop
> + * get_old_cluster_logical_slot_infos().
> + */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> + check_old_cluster_for_valid_slots(live_check);
> +
>
> IIUC we are preferring to use the <= 1600 style of version check
> instead of >= 1700 where possible.
>

Yeah, but in this case, following the nearby code style, I think it is
okay to keep it as it is.

> ~
>
> 3b.
> /Quick exit/Quick return/
>

Hmm, either way should be okay.

> ~
>
> 4.
> + prep_status("Checking for logical replication slots");
>
> I felt that should add the word "valid" like:
> "Checking for valid logical replication slots"
>

Agreed and fixed.

> ~~~
>
> 5.
> + /* Check there are no logical replication slots with a 'lost' state. */
> + res = executeQueryOrDie(conn,
> + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status = 'lost' AND "
> + "temporary IS FALSE;");
>
> Since the SQL is checking if there *are* lost slots I felt it would be
> more natural to reverse that comment.
>
> SUGGESTION
> /* Check and reject if there are any logical replication slots with a
> 'lost' state. */
>

I changed the comments but differently.

> ~~~
>
> 6.
> + /*
> + * Do additional checks if a live check is not required. This requires
> + * that confirmed_flush_lsn of all the slots is the same as the latest
> + * checkpoint location, but it would be satisfied only when the server has
> + * been shut down.
> + */
> + if (!live_check)
>
> I think the comment can be rearranged slightly:
>
> SUGGESTION
> Do additional checks to ensure that 'confirmed_flush_lsn' of all the
> slots is the same as the latest checkpoint location.
> Note: This can be satisfied only when the old_cluster has been shut
> down, so we skip this for "live" checks.
>

Changed as per suggestion.

> ======
> src/bin/pg_upgrade/controldata.c
>
> 7.
> + /*
> + * Read the latest checkpoint location if the cluster is PG17
> + * or later. This is used for upgrading logical replication
> + * slots.
> + */
> + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> + {
>
> Fetching this "Latest checkpoint location:" value is only needed for
> the check_old_cluster_for_valid_slots validation check, isn't it? But
> AFAICT this code is common for both old_cluster and new_cluster.
>
> I am not sure what is best to do:
> - Do only the minimal logic needed?
> - Read the value redundantly even for new_cluster just to keep code simpler?
>
> Either way, maybe the comment should say something about this.
>

Added the comment.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

31 августа 2023 г., 16:22:59

On Tue, Aug 29, 2023 at 5:28 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Some comments in 0002

1.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");

What is the reason we are ignoring temporary slots here?  I think we
better explain in the comments.

2.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slots but
found \"%s\"",
+ PQgetvalue(res, 0, 0));

It looks a bit odd to me that first it is fetching all the logical
slots from the new cluster and then printing the name of one of the
slots.  If it is printing the name of the slots then shouldn't it be
printing all the slots' names or it should just say that there
existing slots on the new cluster without giving any names?  And if we
are planning for option 2 i.e. not printing the name then better to
put LIMIT 1 at the end of the query.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

31 августа 2023 г., 17:26:21

Dear Dilip,

Thanks for giving comments!

> Some comments in 0002
> 
> 1.
> + res = executeQueryOrDie(conn, "SELECT slot_name "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE slot_type = 'logical' AND "
> + "temporary IS FALSE;");
> 
> What is the reason we are ignoring temporary slots here?  I think we
> better explain in the comments.

The temporary slots were expressly ignored while checking because such slots
cannot exist after the upgrade. Before doing pg_upgrade, both old and new cluster
must be turned off, and they start/stop several times during the upgrade.

How do you think?

> 2.
> + res = executeQueryOrDie(conn, "SELECT slot_name "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE slot_type = 'logical' AND "
> + "temporary IS FALSE;");
> +
> + if (PQntuples(res))
> + pg_fatal("New cluster must not have logical replication slots but
> found \"%s\"",
> + PQgetvalue(res, 0, 0));
> 
> It looks a bit odd to me that first it is fetching all the logical
> slots from the new cluster and then printing the name of one of the
> slots.  If it is printing the name of the slots then shouldn't it be
> printing all the slots' names or it should just say that there
> existing slots on the new cluster without giving any names?  And if we
> are planning for option 2 i.e. not printing the name then better to
> put LIMIT 1 at the end of the query.

I'm planning to change that the number of slots are reported by using count(*).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

01 сентября 2023 г., 07:17:53

On Thu, Aug 31, 2023 at 7:56 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

> Thanks for giving comments!

Thanks

> > Some comments in 0002
> >
> > 1.
> > + res = executeQueryOrDie(conn, "SELECT slot_name "
> > + "FROM pg_catalog.pg_replication_slots "
> > + "WHERE slot_type = 'logical' AND "
> > + "temporary IS FALSE;");
> >
> > What is the reason we are ignoring temporary slots here?  I think we
> > better explain in the comments.
>
> The temporary slots were expressly ignored while checking because such slots
> cannot exist after the upgrade. Before doing pg_upgrade, both old and new cluster
> must be turned off, and they start/stop several times during the upgrade.
>
> How do you think?

LGTM

>
> > 2.
> > + res = executeQueryOrDie(conn, "SELECT slot_name "
> > + "FROM pg_catalog.pg_replication_slots "
> > + "WHERE slot_type = 'logical' AND "
> > + "temporary IS FALSE;");
> > +
> > + if (PQntuples(res))
> > + pg_fatal("New cluster must not have logical replication slots but
> > found \"%s\"",
> > + PQgetvalue(res, 0, 0));
> >
> > It looks a bit odd to me that first it is fetching all the logical
> > slots from the new cluster and then printing the name of one of the
> > slots.  If it is printing the name of the slots then shouldn't it be
> > printing all the slots' names or it should just say that there
> > existing slots on the new cluster without giving any names?  And if we
> > are planning for option 2 i.e. not printing the name then better to
> > put LIMIT 1 at the end of the query.
>
> I'm planning to change that the number of slots are reported by using count(*).

Yeah, that seems a better option.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

01 сентября 2023 г., 07:46:18

Dear Peter,

Thanks for giving comments! PSA new version.
I replied only comment 8 because others were replied by Amit.

> .../t/003_logical_replication_slots.pl
> 
> 8. Consider adding one more test
> 
> Maybe there should also be some "live check" test performed (e.g.
> using --check, and a running old_cluster).
> 
> This would demonstrate pg_upgrade working successfully even when the
> WAL records are not consumed (because LSN checks would be skipped in
> check_old_cluster_for_valid_slots function).

I was ignored the case because it did not improve improve code coverage, but
indeed, no one has checked the feature. I'm still not sure what should be, but
added. I want to hear your opinions.



Furthermore, based on comments from Dilip [1], added the comment and
check_new_cluster_logical_replication_slots() was modified. IIUC pg_upgrade
does not have method to handle plural form, so if-statement was used.
If you have better options, please tell me.

[1]: https://www.postgresql.org/message-id/CAFiTN-tgm9wCTyG4co%2BVZhyFTnzh-KoPtYbuH9bRFmxroJ34EQ%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Dilip,

Thank you for reviewing! 

> 
> 1.
> + conn = connectToServer(&new_cluster, "template1");
> +
> + prep_status("Checking for logical replication slots");
> +
> + res = executeQueryOrDie(conn, "SELECT slot_name "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE slot_type = 'logical' AND "
> + "temporary IS FALSE;");
> 
> 
> I think we should add some comment saying this query will only fetch
> logical slots because the database name will always be NULL in the
> physical slots.  Otherwise looking at the query it is very confusing
> how it is avoiding the physical slots.

Hmm, the query you pointed out does not check the database of the slot...
We are fetching only logical slots by the condition "slot_type = 'logical'",
I think it is too trivial to describe in the comment.
Just to confirm - pg_replication_slots can see alls the slots even if the database
is not current one.

```
tmp=# SELECT slot_name, slot_type, database FROM pg_replication_slots where database != current_database();
 slot_name | slot_type | database 
-----------+-----------+----------
 test      | logical   | postgres
(1 row)
```

If I misunderstood something, please tell me...

> 2.
> +void
> +get_old_cluster_logical_slot_infos(void)
> +{
> + int dbnum;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
> +
> + pg_log(PG_VERBOSE, "\nsource databases:");
> 
> I think we need to change some headings like "slot info source
> databases:"  Or add an extra message saying printing slot information.
> 
> Before this patch, we were printing all the relation information so
> message ordering was quite clear e.g.
> 
> source databases:
> Database: "template1"
> relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
> relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
> reltblspace: ""
> Database: "postgres"
> relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
> relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
> reltblspace: ""
> 
> But after this patch slot information is also getting printed in a
> similar fashion so it's very confusing now.  Refer
> get_db_and_rel_infos() for how it is fetching all the relation
> information first and then printing them.
> 
> 
> 
> 
> 3. One more problem is that the slot information and the execute query
> messages are intermingled so it becomes more confusing, see the below
> example of the latest messaging.  I think ideally we should execute
> these queries first
> and then print all slot information together instead of intermingling
> the messages.
> 
> source databases:
> executing: SELECT pg_catalog.set_config('search_path', '', false);
> executing: SELECT slot_name, plugin, two_phase FROM
> pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
> database = current_database() AND temporary IS FALSE;
> Database: "template1"
> executing: SELECT pg_catalog.set_config('search_path', '', false);
> executing: SELECT slot_name, plugin, two_phase FROM
> pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
> database = current_database() AND temporary IS FALSE;
> Database: "postgres"
> slotname: "isolation_slot1", plugin: "pgoutput", two_phase: 0
> 
> 4.  Looking at the above two comments I feel that now the order should be like
> - Fetch all the db infos
> get_db_infos()
> - loop
>    get_rel_infos()
>    get_old_cluster_logical_slot_infos()
> 
> -- and now print relation and slot information per database
>  print_db_infos()

Fixed like that. It seems that we go back to old style...
Now the debug prints are like below:

```
source databases:
Database: "template1"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: ""
Database: "postgres"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683, reltblspace: ""
Logical replication slots within the database:
slotname: "old1", plugin: "test_decoding", two_phase: 0
slotname: "old2", plugin: "test_decoding", two_phase: 0
slotname: "old3", plugin: "test_decoding", two_phase: 0
```


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Friday, September 1, 2023 9:05 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
>

Hi,

Thanks for updating the patch.
I have a comment about the check related to the wal_status.

Currently, there are few places where we check the wal_status of slots. e.g.
check_old_cluster_for_valid_slots(),get_loadable_libraries(), and
get_old_cluster_logical_slot_infos().

But as discussed in another thread[1]. There are some kind of WALs that will be
written when pg_upgrade are checking the old cluster which could cause the wal
size to exceed the max_slot_wal_keep_size. In this case, checkpoint will remove
the wals required by slots and invalidate these slots(the wal_status get
changed as well).

Based on this, it’s possible that the slots we get each time when checking
wal_status are different, because they may get changed in between these checks.
This may not cause serious problems for now, because we will either copy all
the slots including ones invalidated when upgrading or we report ERROR. But I
feel it's better to get consistent result each time we check the slots to close
the possibility for problems in the future. So, I feel we could centralize the
check for wal_status and slots fetch, so that even if some slots status changed
after that, it won't have a risk to affect our check. What do you think ?

[1]
https://www.postgresql.org/message-id/flat/CAA4eK1LLik2818uzYqS73O%2BHe5LK_%2B%3DkthyZ6hwT6oe9TuxycA%40mail.gmail.com#16efea0a76d623b1335e73fc1e28f5ef

Best Regards,
Hou zj

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

05 сентября 2023 г., 10:34:48

Dear Hou-san,

> Based on this, it’s possible that the slots we get each time when checking
> wal_status are different, because they may get changed in between these checks.
> This may not cause serious problems for now, because we will either copy all
> the slots including ones invalidated when upgrading or we report ERROR. But I
> feel it's better to get consistent result each time we check the slots to close
> the possibility for problems in the future. So, I feel we could centralize the
> check for wal_status and slots fetch, so that even if some slots status changed
> after that, it won't have a risk to affect our check. What do you think ?

Thank you for giving the suggestion! I agreed that to centralize checks, and I
had already started to modify. Here is the updated patch.

In this patch all slot infos are extracted in the get_old_cluster_logical_slot_infos(),
upcoming functions uses them. Based on the change, two attributes confirmed_flush
and wal_status were added in LogicalSlotInfo.

IIUC we cannot use strcut List in the client codes, so structures and related
functions are added in the function.c. These are used for extracting unique
plugins, but it may be overkill because check_loadable_libraries() handle
duplicated entries. If we can ignore duplicated entries, these functions can be
removed.

Also, for simplifying codes, only a first-met invalidated slot is output in the
check_old_cluster_for_valid_slots(). Warning messages int the function were
removed. I think it may be enough because check_new_cluster_is_empty() do
similar thing. Please tell me if it should be reverted...


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

06 сентября 2023 г., 06:17:52

On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> Dear Hou-san,
> 
> > Based on this, it’s possible that the slots we get each time when
> > checking wal_status are different, because they may get changed in between
> these checks.
> > This may not cause serious problems for now, because we will either
> > copy all the slots including ones invalidated when upgrading or we
> > report ERROR. But I feel it's better to get consistent result each
> > time we check the slots to close the possibility for problems in the
> > future. So, I feel we could centralize the check for wal_status and
> > slots fetch, so that even if some slots status changed after that, it won't have
> a risk to affect our check. What do you think ?
> 
> Thank you for giving the suggestion! I agreed that to centralize checks, and I
> had already started to modify. Here is the updated patch.
> 
> In this patch all slot infos are extracted in the
> get_old_cluster_logical_slot_infos(),
> upcoming functions uses them. Based on the change, two attributes
> confirmed_flush and wal_status were added in LogicalSlotInfo.
> 
> IIUC we cannot use strcut List in the client codes, so structures and related
> functions are added in the function.c. These are used for extracting unique
> plugins, but it may be overkill because check_loadable_libraries() handle
> duplicated entries. If we can ignore duplicated entries, these functions can be
> removed.
> 
> Also, for simplifying codes, only a first-met invalidated slot is output in the
> check_old_cluster_for_valid_slots(). Warning messages int the function were
> removed. I think it may be enough because check_new_cluster_is_empty() do
> similar thing. Please tell me if it should be reverted...

Thank for updating the patch ! here are few comments.

1.

+    res = executeQueryOrDie(conn, "SHOW wal_level;");
+    wal_level = PQgetvalue(res, 0, 0);

+    res = executeQueryOrDie(conn, "SHOW wal_level;");
+    wal_level = PQgetvalue(res, 0, 0);

I think it would be better to do a sanity check using PQntuples() before
calling PQgetvalue() in above places.

2.

+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)
+{
+    WALAvailability status = WALAVAIL_INVALID_LSN;
+
+    if (strcmp(str, "reserved") == 0)
+        status = WALAVAIL_RESERVED;

Not a comment, but I am wondering if we could use conflicting field to do this
check, so that we could avoid the new conversion function and structure
movement. What do you think ?


3.

+            curr->confirmed_flush = strtoLSN(
+                                             PQgetvalue(res,
+                                                        slotnum,
+                                                        i_confirmed_flush),
+                                             &have_error);

The indention looks a bit unusual.

4.
+     * XXX: As mentioned in comments atop get_output_plugins(), we may not
+     * have to consider the uniqueness of entries. If so, we can use
+     * count_old_cluster_logical_slots() instead of plugin_list_length().
+     */

I think check_loadable_libraries() will avoid loading the same library, so it
seems fine to skip duplicating the plugins and we can save some codes.

----
        /* Did the library name change?  Probe it. */
        if (libnum == 0 || strcmp(lib, os_info.libraries[libnum - 1].name) != 0)
----

But if we think duplicating them would be better, I feel we could use the
SimpleStringList to store and duplicate the plugin name. get_output_plugins can
return an array of the stringlist, each stringlist includes the plugins names
in one db. I shared a rough POC patch to show how it works, the intention is to
avoid introducing our new plugin list API.

5.

+    os_info.libraries = (LibraryInfo *) pg_malloc(
+                                                  (totaltups + plugin_list_length(output_plugins)) *
sizeof(LibraryInfo));

If we think this looks too long, maybe using pg_malloc_array can help.

Best Regards,
Hou zj

Вложения

0001-use-simple-ptr-list_topup_patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

06 сентября 2023 г., 08:25:08

Hi, here are some comments for patch v31-0002.

======
src/bin/pg_upgrade/controldata.c

1. get_control_data

+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+ {
+ bool have_error = false;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ cluster->controldata.chkpnt_latest =
+ strtoLSN(p, &have_error);

1a.
The declaration assignment of 'have_error' is redundant because it
gets overwritten before it is checked anyhow.

~

1b.
IMO that first check logic should also be shifted to be *inside* the
strtoLSN and it would just return have_error true. This eliminates
having 2x pg_fatal that have the same purpose.

~~~

2. strtoLSN

+/*
+ * Convert String to XLogRecPtr.
+ *
+ * This function is ported from pg_lsn_in_internal(). The function cannot be
+ * called from client binaries.
+ */
+XLogRecPtr
+strtoLSN(const char *str, bool *have_error)

SUGGESTION (comment wording)
This function is ported from pg_lsn_in_internal() which cannot be
called from client binaries.

======
src/bin/pg_upgrade/function.c

3. struct plugin_list

+typedef struct plugin_list
+{
+ int dbnum;
+ char    *plugin;
+ struct plugin_list *next;
+} plugin_list;

I found that name confusing. IMO should be like 'plugin_list_elem'.

e.g. it gets too strange in subsequent code:
+ plugin_list *newentry = (plugin_list *) pg_malloc(sizeof(plugin_list));

~~~

4. is_plugin_unique

+/* Has the given plugin already been listed? */
+static bool
+is_plugin_unique(plugin_list_head *listhead, const char *plugin)
+{
+ plugin_list *point;
+
+ /* Quick return if the head is NULL */
+ if (listhead == NULL)
+ return true;
+
+ /* Seek the plugin list */
+ for (point = listhead->head; point; point = point->next)
+ {
+ if (strcmp(point->plugin, plugin) == 0)
+ return false;
+ }
+
+ return true;
+}

What's the meaning of the name 'point'? Maybe something generic like
'cur' or similar is better?

~~~

5. get_output_plugins

+/*
+ * Load the list of unique output plugins.
+ *
+ * XXX: Currently, we extract the list of unique output plugins, but this may
+ * be overkill. The list is used for two purposes - 1) to allocate the minimal
+ * memory for the library list and 2) to skip storing duplicated plugin names.
+ * However, the consumer check_loadable_libraries() can avoid double checks for
+ * the same library. The above means that we can arrange output plugins without
+ * considering their uniqueness, so that we can remove this function.
+ */
+static plugin_list_head *
+get_output_plugins(void)
+{
+ plugin_list_head *head = NULL;
+ int dbnum;
+
+ /* Quick return if there are no logical slots to be migrated. */
+ if (count_old_cluster_logical_slots() == 0)
+ return NULL;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ {
+ LogicalSlotInfoArr *slot_arr = &old_cluster.dbarr.dbs[dbnum].slot_arr;
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot = &slot_arr->slots[slotnum];
+
+ /* Add to the list if the plugin has not been listed yet */
+ if (is_plugin_unique(head, slot->plugin))
+ add_plugin_list_item(&head, dbnum, slot->plugin);
+ }
+ }
+
+ return head;
+}

About the XXX. Yeah, since the uniqueness seems checked later anyway
all this extra code seems overkill. Instead of all the extra code you
just need a comment to mention how it will be sorted and checked
later.

But even if you prefer to keep it, I thought those 2 functions
'is_plugin_unique()' and 'add_plugin_list_item()' could have been
combined to just have 'add_plugin_list_unique_item()'. Since order
does not matter, such a function would just add items to the end of
the list (after finding uniqueness) instead of to the head.

~~~

6. get_loadable_libraries

  FirstNormalObjectId);
+
  totaltups += PQntuples(ress[dbnum]);
~

The extra blank line in the existing code is not needed in this patch.

~~~

7. get_loadable_libraries

  int rowno;
+ plugin_list *point;

~

Same as a prior comment #4. What's the meaning of the name 'point'?

~~~

8. get_loadable_libraries
+
+ /*
+ * If the old cluster has logical replication slots, plugins used by
+ * them must be also stored. It must be done only once, so do it at
+ * dbnum == 0 case.
+ */
+ if (output_plugins == NULL)
+ continue;
+
+ if (dbnum != 0)
+ continue;

This logic seems misplaced. If this "must be done only once" then why
is it within the db loop in the first place? Shouldn't this be done
seperately outside the loop?

======
src/bin/pg_upgrade/info.c

9.
+/*
+ * Helper function for get_old_cluster_logical_slot_infos()
+ */
+static WALAvailability
+GetWALAvailabilityByString(const char *str)

Should this be forward declared like the other static functions are?

~~~

10. get_old_cluster_logical_slot_infos

+ for (slotnum = 0; slotnum < num_slots; slotnum++)
+ {
+ LogicalSlotInfo *curr = &slotinfos[slotnum];
+ bool have_error = false;

Here seems an unnecessary assignment to 'have_error' because it will
always be assigned again before it is checked.

~~~

11. get_old_cluster_logical_slot_infos

+ curr->confirmed_flush = strtoLSN(
+ PQgetvalue(res,
+ slotnum,
+ i_confirmed_flush),
+ &have_error);
+ curr->wal_status = GetWALAvailabilityByString(
+   PQgetvalue(res,
+ slotnum,
+ i_wal_status));

Can this excessive wrapping be improved? Maybe new vars are needed.

~~~

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+ if (slotnum == 0)
+ pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+    slot_info->slotname,
+    slot_info->plugin,
+    slot_info->two_phase);
+ }
+}

This seems an odd way to output the heading. Isn't it better to put
this outside the loop?

SUGGESTION
if (slot_arr->nslots > 0)
  pg_log(PG_VERBOSE, "Logical replication slots within the database:");

======
src/bin/pg_upgrade/pg_upgrade.c

13.
+/*
+ * setup_new_cluster()
+ *
+ * Starts a new cluster for updating the wal_level in the control fine, then
+ * does final setups. Logical slots are also created here.
+ */
+static void
+setup_new_cluster(void)

typo

/control fine/control file/

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

06 сентября 2023 г., 08:30:39

On Tue, Sep 5, 2023 at 7:34 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Also, for simplifying codes, only a first-met invalidated slot is output in the
> check_old_cluster_for_valid_slots(). Warning messages int the function were
> removed. I think it may be enough because check_new_cluster_is_empty() do
> similar thing. Please tell me if it should be reverted...
>

Another possible idea is to show all the WARNINGS but only when in verbose mode.

-------
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

06 сентября 2023 г., 08:39:26

On Wednesday, September 6, 2023 11:18 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田 隼人
> <kuroda.hayato@fujitsu.com> wrote:
> 
> 4.
> +     * XXX: As mentioned in comments atop get_output_plugins(), we may
> not
> +     * have to consider the uniqueness of entries. If so, we can use
> +     * count_old_cluster_logical_slots() instead of plugin_list_length().
> +     */
> 
> I think check_loadable_libraries() will avoid loading the same library, so it seems
> fine to skip duplicating the plugins and we can save some codes.

Sorry, there is a typo, I mean "deduplicating" instead of " duplicating "

> 
> ----
>         /* Did the library name change?  Probe it. */
>         if (libnum == 0 || strcmp(lib, os_info.libraries[libnum -
> 1].name) != 0)
> ----
> 
> But if we think duplicating them would be better, I feel we could use the

Here also " duplicating " should be "deduplicating".

Best Regards,
Hou zj

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

06 сентября 2023 г., 11:55:47

On Wed, Sep 6, 2023 at 8:47 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, September 5, 2023 3:35 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Hou-san,
> >
> > > Based on this, it’s possible that the slots we get each time when
> > > checking wal_status are different, because they may get changed in between
> > these checks.
> > > This may not cause serious problems for now, because we will either
> > > copy all the slots including ones invalidated when upgrading or we
> > > report ERROR. But I feel it's better to get consistent result each
> > > time we check the slots to close the possibility for problems in the
> > > future. So, I feel we could centralize the check for wal_status and
> > > slots fetch, so that even if some slots status changed after that, it won't have
> > a risk to affect our check. What do you think ?
> >
> > Thank you for giving the suggestion! I agreed that to centralize checks, and I
> > had already started to modify. Here is the updated patch.
> >
> > In this patch all slot infos are extracted in the
> > get_old_cluster_logical_slot_infos(),
> > upcoming functions uses them. Based on the change, two attributes
> > confirmed_flush and wal_status were added in LogicalSlotInfo.
> >
> > IIUC we cannot use strcut List in the client codes, so structures and related
> > functions are added in the function.c. These are used for extracting unique
> > plugins, but it may be overkill because check_loadable_libraries() handle
> > duplicated entries. If we can ignore duplicated entries, these functions can be
> > removed.
> >
> > Also, for simplifying codes, only a first-met invalidated slot is output in the
> > check_old_cluster_for_valid_slots(). Warning messages int the function were
> > removed. I think it may be enough because check_new_cluster_is_empty() do
> > similar thing. Please tell me if it should be reverted...
>
> Thank for updating the patch ! here are few comments.
>
> 1.
>
> +       res = executeQueryOrDie(conn, "SHOW wal_level;");
> +       wal_level = PQgetvalue(res, 0, 0);
>
> +       res = executeQueryOrDie(conn, "SHOW wal_level;");
> +       wal_level = PQgetvalue(res, 0, 0);
>
> I think it would be better to do a sanity check using PQntuples() before
> calling PQgetvalue() in above places.
>
> 2.
>
> +/*
> + * Helper function for get_old_cluster_logical_slot_infos()
> + */
> +static WALAvailability
> +GetWALAvailabilityByString(const char *str)
> +{
> +       WALAvailability status = WALAVAIL_INVALID_LSN;
> +
> +       if (strcmp(str, "reserved") == 0)
> +               status = WALAVAIL_RESERVED;
>
> Not a comment, but I am wondering if we could use conflicting field to do this
> check, so that we could avoid the new conversion function and structure
> movement. What do you think ?
>

I also think referring to the conflicting field would be better not
only for the purpose of avoiding extra code but also to give accurate
information about invalidated slots for which we want to give an
error.

Additionally, I think we should try to avoid writing a new function
strtoLSN as that adds a maintainability burden. We can probably send
the value fetched from pg_controldata in the query for comparison with
confirmed_flush LSN.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

06 сентября 2023 г., 12:26:44

On Wed, Sep 6, 2023 at 11:01 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Sep 5, 2023 at 7:34 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Also, for simplifying codes, only a first-met invalidated slot is output in the
> > check_old_cluster_for_valid_slots(). Warning messages int the function were
> > removed. I think it may be enough because check_new_cluster_is_empty() do
> > similar thing. Please tell me if it should be reverted...
> >
>
> Another possible idea is to show all the WARNINGS but only when in verbose mode.
>

I think it would be better to write problematic slots in the script
file like we are doing in the function
check_for_composite_data_type_usage()->check_for_data_types_usage()
and give a message suggesting what the user can do as we are doing in
check_for_composite_data_type_usage(). That will be helpful for the
user to take necessary action.

A few other comments:
=================
1.
@@ -189,6 +199,8 @@ check_new_cluster(void)
 {
  get_db_and_rel_infos(&new_cluster);

+ check_new_cluster_logical_replication_slots();
+
  check_new_cluster_is_empty();

  check_loadable_libraries();

Why check_new_cluster_logical_replication_slots is done before
check_new_cluster_is_empty? At least check_new_cluster_is_empty()
would be much quicker to return an error if any. I think if we don't
have a specific reason to position this new check, we can do it at the
end after check_for_new_tablespace_dir() to avoid breaking the order
of existing checks.

2. Shall we rename get_db_and_rel_infos() to
get_db_rel_and_slot_infos() or something like that as that function
now fetches the slot information as well?

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

06 сентября 2023 г., 16:35:02

Dear Hou,

Thank you for giving comments! PSA new version.
0001 is updated based on the forked thread.

> 
> 1.
> 
> +    res = executeQueryOrDie(conn, "SHOW wal_level;");
> +    wal_level = PQgetvalue(res, 0, 0);
> 
> +    res = executeQueryOrDie(conn, "SHOW wal_level;");
> +    wal_level = PQgetvalue(res, 0, 0);
> 
> I think it would be better to do a sanity check using PQntuples() before
> calling PQgetvalue() in above places.

Added.

> 2.
> 
> +/*
> + * Helper function for get_old_cluster_logical_slot_infos()
> + */
> +static WALAvailability
> +GetWALAvailabilityByString(const char *str)
> +{
> +    WALAvailability status = WALAVAIL_INVALID_LSN;
> +
> +    if (strcmp(str, "reserved") == 0)
> +        status = WALAVAIL_RESERVED;
> 
> Not a comment, but I am wondering if we could use conflicting field to do this
> check, so that we could avoid the new conversion function and structure
> movement. What do you think ?

I checked pg_get_replication_slots() and agreed that pg_replication_slots.conflicting
indicates whether the slot is usable or not. I can use the attribute instead of porting
WALAvailability. Fixed.

> 3.
> 
> +            curr->confirmed_flush = strtoLSN(
> +
>          PQgetvalue(res,
> +
>                     slotnum,
> +
>                     i_confirmed_flush),
> +
>          &have_error);
> 
> The indention looks a bit unusual.

The part is not needed anymore.

> 4.
> +     * XXX: As mentioned in comments atop get_output_plugins(), we may
> not
> +     * have to consider the uniqueness of entries. If so, we can use
> +     * count_old_cluster_logical_slots() instead of plugin_list_length().
> +     */
> 
> I think check_loadable_libraries() will avoid loading the same library, so it
> seems fine to skip duplicating the plugins and we can save some codes.
> 
> ----
>         /* Did the library name change?  Probe it. */
>         if (libnum == 0 || strcmp(lib, os_info.libraries[libnum -
> 1].name) != 0)
> ----
> 
> But if we think duplicating them would be better, I feel we could use the
> SimpleStringList to store and duplicate the plugin name. get_output_plugins can
> return an array of the stringlist, each stringlist includes the plugins names
> in one db. I shared a rough POC patch to show how it works, the intention is to
> avoid introducing our new plugin list API.

Actually I do not like the style neither. Peter also said that we can skip checking the
uniqueness, so removed.

> 5.
> 
> +    os_info.libraries = (LibraryInfo *) pg_malloc(
> +
>               (totaltups + plugin_list_length(output_plugins)) *
> sizeof(LibraryInfo));
> 
> If we think this looks too long, maybe using pg_malloc_array can help.
>

I checked whole of the patch and used these shorten macros if the line exceeded
80 columns.

Also, I found a cfbot failure [1] but I could not find any reasons.
I will keep investigating more about it.

[1]: https://cirrus-ci.com/task/4634769732927488

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

Thank you for reviewing! PSA new version.

> ======
> src/bin/pg_upgrade/check.c
> 
> 1. check_new_cluster_logical_replication_slots
> 
> + res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
> + max_replication_slots = atoi(PQgetvalue(res, 0, 0));
> +
> + if (PQntuples(res) != 1)
> + pg_fatal("could not determine max_replication_slots");
> 
> Shouldn't the PQntuples check be *before* the PQgetvalue and
> assignment to max_replication_slots?

Right, fixed. Also, the checking was added at the first query.

> 2. check_new_cluster_logical_replication_slots
> 
> + res = executeQueryOrDie(conn, "SHOW wal_level;");
> + wal_level = PQgetvalue(res, 0, 0);
> +
> + if (PQntuples(res) != 1)
> + pg_fatal("could not determine wal_level");
> 
> Shouldn't the PQntuples check be *before* the PQgetvalue and
> assignment to wal_level?

Fixed.

> 3. check_old_cluster_for_valid_slots
> 
> I saw that similar code with scripts like this is doing PG_REPORT:
> 
> pg_log(PG_REPORT, "fatal");
> 
> but that PG_REPORT is missing from this function.

Added.

> src/bin/pg_upgrade/function.c
> 
> 4. get_loadable_libraries
> 
> @@ -42,11 +43,12 @@ library_name_compare(const void *p1, const void *p2)
>   ((const LibraryInfo *) p2)->dbnum;
>  }
> 
> -
>  /*
>   * get_loadable_libraries()
> 
> ~
> 
> Removing that blank line (above this function) should not be included
> in the patch.

Restored the blank.

> 5. get_loadable_libraries
> 
> + /*
> + * Allocate a memory for extensions and logical replication output
> + * plugins.
> + */
> + os_info.libraries = pg_malloc_array(LibraryInfo,
> + totaltups + count_old_cluster_logical_slots());
> 
> /Allocate a memory/Allocate memory/

Fixed.

> 6. get_loadable_libraries
> + /*
> + * Store the name of output plugins as well. There is a possibility
> + * that duplicated plugins are set, but the consumer function
> + * check_loadable_libraries() will avoid checking the same library, so
> + * we do not have to consider their uniqueness here.
> + */
> + for (slotno = 0; slotno < slot_arr->nslots; slotno++)
> 
> /Store the name/Store the names/

Fixed.

> src/bin/pg_upgrade/info.c
> 
> 7. get_old_cluster_logical_slot_infos
> 
> + i_slotname = PQfnumber(res, "slot_name");
> + i_plugin = PQfnumber(res, "plugin");
> + i_twophase = PQfnumber(res, "two_phase");
> + i_caughtup = PQfnumber(res, "caughtup");
> + i_conflicting = PQfnumber(res, "conflicting");
> +
> + for (slotnum = 0; slotnum < num_slots; slotnum++)
> + {
> + LogicalSlotInfo *curr = &slotinfos[slotnum];
> +
> + curr->slotname = pg_strdup(PQgetvalue(res, slotnum, i_slotname));
> + curr->plugin = pg_strdup(PQgetvalue(res, slotnum, i_plugin));
> + curr->two_phase = (strcmp(PQgetvalue(res, slotnum, i_twophase), "t") == 0);
> + curr->caughtup = (strcmp(PQgetvalue(res, slotnum, i_caughtup), "t") == 0);
> + curr->conflicting = (strcmp(PQgetvalue(res, slotnum, i_conflicting),
> "t") == 0);
> + }
> 
> Saying "tup" always looks like it should be something tuple-related.
> IMO it will be better to call all these "caught_up" instead of
> "caughtup":
> 
> "caughtup" ==> "caught_up"
> i_caughtup ==> i_caught_up
> curr->caughtup ==> curr->caught_up

Fixed. The alias was also fixed.

> 8. print_slot_infos
> 
> +static void
> +print_slot_infos(LogicalSlotInfoArr *slot_arr)
> +{
> + int slotnum;
> +
> + if (slot_arr->nslots > 1)
> + pg_log(PG_VERBOSE, "Logical replication slots within the database:");
> +
> + for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
> + {
> + LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
> +
> + pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
> +    slot_info->slotname,
> +    slot_info->plugin,
> +    slot_info->two_phase);
> + }
> +}
> 
> Although it makes no functional difference, it might be neater if the
> for loop is also within that "if (slot_arr->nslots > 1)" condition.

Hmm, but the point makes more differences between print_rel_infos() and
print_slot_infos(), I thought it should be similar. Instead, I added a quick
return. Thought?

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 9.
> +/*
> + * Structure to store logical replication slot information
> + */
> +typedef struct
> +{
> + char    *slotname; /* slot name */
> + char    *plugin; /* plugin */
> + bool two_phase; /* can the slot decode 2PC? */
> + bool caughtup; /* Is confirmed_flush_lsn the same as latest
> + * checkpoint LSN? */
> + bool conflicting; /* Is the slot usable? */
> +} LogicalSlotInfo;
> 
> 9a.
> + bool caughtup; /* Is confirmed_flush_lsn the same as latest
> + * checkpoint LSN? */
> 
> caughtup ==> caught_up

Fixed.

> 9b.
> + bool conflicting; /* Is the slot usable? */
> 
> The field name has the opposite meaning of the wording of the comment.
> (e.g. it is usable when it is NOT conflicting, right?).
> 
> Maybe there needs a better field name, or a better comment, or both.
> AFAICT from other code pg_fatal message 'conflicting' is always
> interpreted as 'lost' so maybe the field should be called that?

Changed to "is_lost", which is easy to understand the meaning.

Also, I fixed following points:

* Added a period to messages in check_new_cluster_logical_replication_slots(),
  except the final line. According to other functions like check_new_cluster_is_empty(),
  the period is ignored if the escape sequence is at the end.
* Removed the --check test because sometimes it failed on the windows machine.
  I reported in another thread [1].
* Set max_slot_wal_keep_size to -1 when old cluster was started. Accordin to the
  discussion [2], the setting is sufficient to supress the WAL removal.

[1]:
https://www.postgresql.org/message-id/flat/TYAPR01MB586654E2D74B838021BE77CAF5EEA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/ZPl659a5hPDHPq9w%40paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Hou,

Thank you for reviewing! PSA new version! PSA new version.

> Here are some comments:
> 
> 1.
> 
>  bool        reap_child(bool wait_for_child);
> +
> +XLogRecPtr    strtoLSN(const char *str, bool *have_error);
> 
> This function has be removed.

Removed.

> 2.
> 
> +    if (nslots_on_new)
> +    {
> +        if (nslots_on_new == 1)
> +            pg_fatal("New cluster must not have logical replication
> slots but found a slot.");
> +        else
> +            pg_fatal("New cluster must not have logical replication
> slots but found %d slots.",
> +                     nslots_on_new);
> 
> We could try ngettext() here:
>         pg_log_warning(ngettext("New cluster must not have logical
> replication slots but found %d slot.",
>                                 "New
> cluster must not have logical replication slots but found %d slots",
> 
>     nslots_on_new)

I agreed to use ngettext(), but I disagreed to change to warning.
Changed to use ngettext().

> 3.
> -    create_script_for_old_cluster_deletion(&deletion_script_file_name);
> -
> 
> Is there a reason for reordering this function ? Sorry If I missed some
> previous discussions.

We discussed to move create_logical_replication_slots(), but not for
create_script_for_old_cluster_deletion(). Restored.

> 4.
> 
> @@ -610,6 +724,12 @@ free_db_and_rel_infos(DbInfoArr *db_arr)
>      {
>          free_rel_infos(&db_arr->dbs[dbnum].rel_arr);
>          pg_free(db_arr->dbs[dbnum].db_name);
> +
> +        /*
> +         * Logical replication slots must not exist on the new cluster
> before
> +         * create_logical_replication_slots().
> +         */
> +        Assert(db_arr->dbs[dbnum].slot_arr.nslots == 0);
> 
> 
> I think the assert is not necessary, as the patch will check the new cluster's
> slots in another function. Besides, this function is not only used for new
> cluster, but the comment only mentioned the new cluster which seems a bit
> inconsistent. So, how about removing it ?

Amit also pointed out, so removed the Assertion and comment.

> 5.
>               (cluster == &new_cluster) ?
> -             " -c synchronous_commit=off -c fsync=off -c
> full_page_writes=off" : "",
> +             " -c synchronous_commit=off -c fsync=off -c
> full_page_writes=off" :
> +             " -c max_slot_wal_keep_size=-1",
> 
> I think we need to set max_slot_wal_keep_size on new cluster as well, otherwise
> it's possible that the new created slots get invalidated during upgrade, what
> do you think ?

Added.

> 6.
> 
> +    bool        is_lost;        /* Is the slot in 'lost'? */
> +} LogicalSlotInfo;
> 
> Would it be better to use 'invalidated', as the same is used in error message
> of ReportSlotInvalidation() and logicaldecoding.sgml.

Per suggestion from Amit, changed to 'invalid'.

> 7.
> +    for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> +    {
>     ...
> +        if (script)
> +        {
> +            fclose(script);
> +
> +            pg_log(PG_REPORT, "fatal");
> +            pg_fatal("The source cluster contains one or more
> problematic logical replication slots.\n"
> 
> I think we should do this pg_fatal out of the for() loop, otherwise we cannot
> collect all the problematic slots.

Yeah, agreed. Fixed.

Also, based on the discussion [1], I added an elog(ERROR) in InvalidatePossiblyObsoleteSlot().

[1]: https://www.postgresql.org/message-id/CAA4eK1%2BWBphnmvMpjrxceymzuoMuyV2_pMGaJq-zNODiJqAa7Q%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thank you for reviewing! PSA new version! PSA new version.
>

Few comments:
==============
1.
+       <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield>
+       of all slots on the old cluster must be the same as the latest
+       checkpoint location.

We can add something like: "This ensures that all the data has been
replicated before the upgrade." to make it clear why this test is
important.

2. Move the wal_level related restriction before max_replication_slots.

3.
+ /* Is the slot still usable? */
+ if (slot->invalid)
+ {
+ if (script == NULL &&
+ (script = fopen_priv(output_path, "w")) == NULL)
+ pg_fatal("could not open file \"%s\": %s",
+ output_path, strerror(errno));
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);
+ }
+
+ /*
+ * Do additional checks to ensure that confirmed_flush LSN of all
+ * the slots is the same as the latest checkpoint location.
+ *
+ * Note: This can be satisfied only when the old cluster has been
+ * shut down, so we skip this for live checks.
+ */
+ if (!live_check && !slot->caught_up)

Isn't it better to continue for the next slot once we find that slot
is invalid instead of checking other conditions?

4.
+
+ fprintf(script,
+ "slotname :%s\tproblem: The slot is unusable\n",
+ slot->slotname);

Let's keep it as one string and change the message to: "The slot
"\"%s\" is invalid"

+ fprintf(script,
+ "slotname :%s\tproblem: The slot has not consumed WALs yet\n",
+ slot->slotname);
+ }

On a similar line, we can change this to: "The slot "\"%s\" has not
consumed the WAL yet"

5.
+ snprintf(output_path, sizeof(output_path), "%s/%s",
+ log_opts.basedir,
+ "problematic_logical_relication_slots.txt");

I think we can name this file as "invalid_logical_replication_slots"
or simply "logical_replication_slots"

6.
+ pg_fatal("The source cluster contains one or more problematic
logical replication slots.\n"
+ "The needed workaround depends on the problem.\n"
+ "1) If the problem is \"The slot is unusable,\" You can drop such
replication slots.\n"
+ "2) If the problem is \"The slot has not consumed WALs yet,\" you
can consume all remaining WALs.\n"
+ "Then, you can restart the upgrade.\n"
+ "A list of problematic logical replication slots is in the file:\n"
+ "    %s", output_path);

This doesn't match the similar existing comments. So, let's change it
to something like:

"Your installation contains invalid logical replication slots.  These
slots can't be copied so this cluster cannot currently be upgraded.
Consider either removing such slots or consuming the pending WAL if
any and then restart the upgrade.  A list of invalid logical
replication slots is in the file:"

Apart from the above, I have edited a few other comments in the patch.
See attached.

--
With Regards,
Amit Kapila.

Вложения

cosmetic_improvements_amit.1.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

11 сентября 2023 г., 16:21:39

Dear Dilip,

Thank you for reviewing! PSA new version.

> 
> 1.
> Note that slot restoration must be done after the final pg_resetwal command
> during the upgrade because pg_resetwal will remove WALs that are required by
> the slots. Due to this restriction, the timing of restoring replication slots is
> different from other objects.
> 
> This comment in the commit message is confusing.  I understand the
> reason but from this, it is not very clear that if resetwal removes
> the WAL we needed then why it is good to create after the resetwal.  I
> think we should make it clear that creating new slot will set the
> restart lsn to current WAL location and after that resetwal can remove
> those WAL where slot restart lsn is pointing....

Just to confirm - WAL records must not be removed in any time if it is referred
as restart_lsn. The reason why the slot creation is done after pg_restwal is that
required WALs are not removed by the command. See [1].
Moreover, clarified more in the commit message.

> 2.
> 
> +    <itemizedlist>
> +     <listitem>
> +      <para>
> +       All slots on the old cluster must be usable, i.e., there are no slots
> +       whose
> +       <link
> linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>
> wal_status</structfield>
> +       is <literal>lost</literal>.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       <link
> linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>c
> onfirmed_flush_lsn</structfield>
> +       of all slots on the old cluster must be the same as the latest
> +       checkpoint location.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       The output plugins referenced by the slots on the old cluster must be
> +       installed in the new PostgreSQL executable directory.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       The new cluster must have
> +       <link
> linkend="guc-max-replication-slots"><varname>max_replication_slots</varna
> me></link>
> +       configured to a value greater than or equal to the number of slots
> +       present in the old cluster.
> +      </para>
> +     </listitem>
> +     <listitem>
> +      <para>
> +       The new cluster must have
> +       <link
> linkend="guc-wal-level"><varname>wal_level</varname></link> as
> +       <literal>logical</literal>.
> +      </para>
> +     </listitem>
> +    </itemizedlist>
> 
> I think we should also add that the new slot should not have any
> permanent existing logical replication slot.

Hmm, I wondered it should be really needed. Tables are required not to be in the
new cluster too, but not documented. It might be a trivial thing. Anyway, added.

FYI - the restriction was not introduced by the patch. I reported independently [2],
but no one has responded since now...

> 3.
> -       with the primary.)  Replication slots are not copied and must
> -       be recreated.
> +       with the primary.)  Replication slots on the old standby are not copied.
> +       Only logical slots on the primary are migrated to the new standby,
> +       and other slots must be recreated.
> 
> This paragraph should be rephrased.  I mean first stating that
> "Replication slots on the old standby are not copied" and then saying
> Only logical slots are migrated doesn't seem like the best way.  Maybe
> we can just say "Only logical slots on the primary are migrated to the
> new standby, and other slots must be recreated."

Per discussion on [3], I used another words. Thanks for suggesting.

> 4.
> + /*
> + * Raise an ERROR if the logical replication slot is invalidating. It
> + * would not happen because max_slot_wal_keep_size is set to -1 during
> + * the upgrade, but it stays safe.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
> 
> Rephrase the first line as ->  Raise an ERROR if the logical
> replication slot is invalidating during an upgrade.

Per discussion on [3], I used another words. Thanks for suggesting.

> 5.
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
> 
> 
> For readability change this to if
> (GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most
> of the checks related to this, we are using 1700 so better be
> consistent in this.

Per discussion on [3], I did not change here.

> 6.
> + if (nslots_on_new)
> + pg_fatal(ngettext("New cluster must not have logical replication
> slots but found %d slot.",
> +   "New cluster must not have logical replication slots but found %d slots.",
> +   nslots_on_new),
> + nslots_on_new);
> ...
> + if (PQntuples(res) != 1)
> + pg_fatal("could not determine wal_level.");
> +
> + wal_level = PQgetvalue(res, 0, 0);
> +
> + if (strcmp(wal_level, "logical") != 0)
> + pg_fatal("wal_level must be \"logical\", but is set to \"%s\"",
> + wal_level);
> 
> 
> I have noticed that the case of the first letter in the pg_fatal
> message is not consistent.

Actually there are some inconsistency even in the check.c file, so I devised
below rules. How do you think?

* Non-complete sentence starts with the lower case.
  (e.g., "could not open", "could not determine")
* proper nouns are always noted with the lower cases
  (e.g., "template0 must not allow...", "wal_level must be...").
* Other than above, the sentence starts with the upper case.

> 7.
> +
> + /* Is the slot still usable? */
> + if (slot->invalid)
> + {
> 
> Why comment says "Is the slot still usable?" I think it should be "Is
> the slot usable?" otherwise it appears that we have first fetched the
> slots and now we are refetching it and checking whether it is still
> usable.

Changed.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYAPR01MB5866D277F6BEDEA4223B3559F5E6A@TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/CAFiTN-vs53SqZiZN1GcSuKLmMY%3D0d14wJDDm1aKmoBONwnqaGg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

On Tue, Sep 12, 2023 at 02:33:25AM +0000, Zhijie Hou (Fujitsu) wrote:
> 2.
> +        if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> +            elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
>
> I think normally the first letter is lowercase, and we can avoid the period.

Documentation is your friend:
https://www.postgresql.org/docs/current/error-style-guide.html
--
Michael

Вложения

signature.asc

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

12 сентября 2023 г., 10:04:23

Dear Peter,

Thank you for reviewing! Before posting new patch set, I want to respond some
comments.

> 
> ======
> 1. GENERAL -- Cluster Terminology
> 
> This is not really a problem of your patch, but during message review,
> I noticed the terms old/new cluster VERSUS source/target cluster and
> both were used many times:
> 
> For example.
> ".*new clusmter --> 44 occurences
> ".*old cluster --> 21 occurences
> ".*source cluster --> 6 occurences
> ".*target cluster --> 12 occurences
> 
> Perhaps there should be a new thread/patch to use consistent terms.
> 
> Thoughts?

I preferred the term new/old because I could not found the term source/target
in the documentation for the pg_upgrade. (IIUC I used new/old in my patch).
Anyway, it should be discussed in another thread.

> 2. GENERAL - Error message cases
> 
> Just FYI, there are many inconsistent capitalising in these patch
> messages, but then the same is also true for the HEAD code. It's a bit
> messy, but generally, I think your capitalisation was aligned with
> what I saw in HEAD, so I didn't comment anywhere about it.

Yeah, the rule is broken even in HEAD. I determined a rule in [1], which seems
consistent with other parts in the file.
Michael kindly told the error message formatting [2], and basically it follows the
style. (IIUC pg_fatal("Your installation...") is followed the
"Detail and hint messages" rule.)

> ======
> src/bin/pg_upgrade/info.c
> 
> 7. get_db_rel_and_slot_infos
> 
> void
> get_db_rel_and_slot_infos(ClusterInfo *cluster)
> {
> int dbnum;
> 
> if (cluster->dbarr.dbs != NULL)
> free_db_and_rel_infos(&cluster->dbarr);
> 
> ~
> 
> Judging from the HEAD code this function was intended to be reentrant
> -- e.g. it does cleanup code free_db_and_rel_infos in case there was
> something there from before.
> 
> IIUC there is no such cleanup for the slot_arr. I forget why this was
> removed. Sure, you might be able to survive the memory leaks, but
> choosing NOT to clean up the slot_arr seems to contradict the
> intention of HEAD calling free_db_and_rel_infos.

free_db_and_rel_infos() is called if get_db_rel_and_slot_infos() is called
several times for the same cluster. Followings are callers: 

* check_and_dump_old_cluster(), target is old_cluster
* check_new_cluster(), target is new_cluster
* create_new_objects(), target is new_cluster

And we requires that new_cluster must not have logical slots, this restriction
cannot ease. Therefore, there are no possibilities slot_arr must be free()'d,
so that I removed (See similar discussion [3]). I think we should not add no-op codes.
In old version there was an Assert() instead, but removed based on the comment [4].

> 8. get_db_infos
> 
> I noticed the pg_malloc0 is reverted in this function.
> 
> - dbinfos = (DbInfo *) pg_malloc(sizeof(DbInfo) * ntups);
> + dbinfos = (DbInfo *) pg_malloc0(sizeof(DbInfo) * ntups);
> 
> IMO it is better to do pg_malloc0 here.
> 
> Sure, everything probably works OK for the current code,

Yes, it works well. No one checks slot_arr before
get_old_cluster_logical_slot_infos(). In the old version, it was checked like
(slot_arr == NULL) infree_db_and_rel_infos(), but removed.

> but it seems
> unnecessarily risky to assume that functions will forever be called in
> a specific order. AFAICT if someone (e.g. for debugging) calls
> count_old_cluster_logical_slots() or calls print_slot_infos() then the
> behaviour is undefined because slot_arr.nslots remains uninitialized.


Hmm, I do not think such assumption is needed. In the current code pg_malloc() is
used in get_db_infos(), so there is a possibility that print_rel_infos() is
executed for debugging. The behavior is undefined - this is same as you said,
and code has been alive. Based on that I think we can accept the risk and
reduce operations instead. If you knew other example, please share here...

[1]:
https://www.postgresql.org/message-id/TYAPR01MB586642D33208D190F67CDD7BF5F2A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/docs/devel/error-style-guide.html#ERROR-STYLE-GUIDE-GRAMMAR-PUNCTUATION
[3]:
https://www.postgresql.org/message-id/TYAPR01MB5866732D30ABB976992BDECCF5789%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[4]:
https://www.postgresql.org/message-id/OS0PR01MB5716670FE547BA87FDEF895E94EDA%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

12 сентября 2023 г., 11:10:13

Dear Michael,

> On Tue, Sep 12, 2023 at 02:33:25AM +0000, Zhijie Hou (Fujitsu) wrote:
> > 2.
> > +        if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> > +            elog(ERROR, "Replication slots must not be invalidated
> during the upgrade.");
> >
> > I think normally the first letter is lowercase, and we can avoid the period.
>
> Documentation is your friend:
> https://www.postgresql.org/docs/current/error-style-guide.html

Thank you for the information! It is quite helpful for me.
(Some fatal errors started with capital character like "Your installation contains...",
but I regarded them as the detail or hint message.)

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

12 сентября 2023 г., 14:50:22

Dear Peter,

Thank you for reviewing! PSA new version.

> src/backend/replication/slot.c
> 
> 3. InvalidatePossiblyObsoleteSlot
> 
> + /*
> + * Raise an ERROR if the logical replication slot is invalidating. It
> + * would not happen because max_slot_wal_keep_size is set to -1 during
> + * the upgrade, but it stays safe.
> + */
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> + elog(ERROR, "Replication slots must not be invalidated during the upgrade.");
> 
> 3a.
> That comment didn't seem good. I think you mean like in the suggestion below.
> 
> SUGGESTION
> It should not be possible for logical replication slots to be
> invalidated because max_slot_wal_keep_size is set to -1 during the
> upgrade. The following is just for sanity-checking.

This part was updated in v35. Please tell me if current version is still bad...

> 3b.
> I wasn't sure if 'max_slot_wal_keep_size' GUC is accessible in this
> scope, but if it is available then maybe
> Assert(max_slot_wal_keep_size_mb == -1); should also be included in
> this sanity check.

IIUC, guc parameters are visible from all the postgres processes.
Added.

> src/bin/pg_upgrade/check.c
> 
> 4. check_new_cluster_logical_replication_slots
> 
> + conn = connectToServer(&new_cluster, "template1");
> +
> + prep_status("Checking for logical replication slots");
> 
> There is some inconsistency with all the subsequent pg_fatals within
> this function -- some of them mention "New cluster" but most of them
> do not.
> 
> Meanwhile, Kuroda-san showed me sample output like:
> 
> Checking for presence of required libraries                   ok
> Checking database user is the install user                    ok
> Checking for prepared transactions                            ok
> Checking for new cluster tablespace directories               ok
> Checking for logical replication slots
> New cluster must not have logical replication slots but found 1 slot.
> Failure, exiting
> 
> So, I felt the log message title ("Checking...") should be changed to
> include the words "new cluster" just like the log preceding it:
> 
> "Checking for logical replication slots" ==> "Checking for new cluster
> logical replication slots"
> 
> Now all the subsequent pg_fatals clearly are for "new cluster"

Changed.

> 5. check_new_cluster_logical_replication_slots
> 
> + if (nslots_on_new)
> + pg_fatal(ngettext("New cluster must not have logical replication
> slots but found %d slot.",
> +   "New cluster must not have logical replication slots but found %d slots.",
> +   nslots_on_new),
> + nslots_on_new);
> 
> 5a.
> TBH, I didn't see why you go to unnecessary trouble to have a plural
> message here. The message could just be like:
> "New cluster must have 0 logical replication slots but found %d."
> 
> ~
> 
> 5b.
> However, now (from the previous review comment #4) if "New cluster" is
> already explicit in the log, the pg_fatal message can become just:
> "New cluster must have ..." ==> "Expected 0 logical replication slots
> but found %d."

Basically it's better. But the initial character should be lower case and period
is not needed. Modified like that.

> 9. get_old_cluster_logical_slot_infos
> 
> + i_slotname = PQfnumber(res, "slot_name");
> + i_plugin = PQfnumber(res, "plugin");
> + i_twophase = PQfnumber(res, "two_phase");
> + i_caught_up = PQfnumber(res, "caught_up");
> + i_invalid = PQfnumber(res, "conflicting");
> 
> IMO SQL should be using an alias for this column, so you can say:
> i_invalid = PQfnumber(res, "invalid")
> 
> which seems better than switching the wording in code.

Modified. The argument of PQfnumber() must be same as the column name, so the
word "as invalid" was added to SQL.

> src/bin/pg_upgrade/pg_upgrade.h
> 
> 10. LogicalSlotInfo
> 
> +typedef struct
> +{
> + char    *slotname; /* slot name */
> + char    *plugin; /* plugin */
> + bool two_phase; /* can the slot decode 2PC? */
> + bool caught_up; /* Is confirmed_flush_lsn the same as latest
> + * checkpoint LSN? */
> + bool invalid; /* Is the slot usable? */
> +} LogicalSlotInfo;
> 
> ~
> 
> + bool invalid; /* Is the slot usable? */
> This field name and comment have opposite meanings. Invalid means NOT usable.
> 
> SUGGESTION
> /* If true, the slot is unusable. */

Fixed.

> src/bin/pg_upgrade/server.c
> 
> 11. start_postmaster
> 
>   * we only modify the new cluster, so only use it there.  If there is a
>   * crash, the new cluster has to be recreated anyway.  fsync=off is a big
>   * win on ext4.
> + *
> + * Also, the max_slot_wal_keep_size is set to -1 to prevent the WAL removal
> + * required by logical slots. The setting could avoid the invalidation of
> + * slots during the upgrade.
>   */
> ~
> 
> IMO this comment "to prevent the WAL removal required by logical
> slots" is ambiguous about how it could be interpreted.  Needs
> rearranging for clarity.

The description was changed. How do you think?

> 12. start_postmaster
> 
>   (cluster == &new_cluster) ?
> - " -c synchronous_commit=off -c fsync=off -c full_page_writes=off" : "",
> + " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c
> max_slot_wal_keep_size=-1 " :
> + " -c max_slot_wal_keep_size=-1",
> 
> Instead of putting the same option on both sides of the ternary, I was
> wondering if it might be better to hardwire the max_slot_wal_keep_size
> just 1 time in the format string?

Fixed.

> .../pg_upgrade/t/003_logical_replication_slots.pl
> 
> 13.
> # Remove the remained slot
> 
> /remained/remaining/

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear hackers,

> > So basically, while scanning from confirmed_flush we must ensure that
> > we find a first record as SHUTDOWN CHECKPOINT record at the same LSN,
> > and after that, we should not get any other WAL other than like you
> > said shutdown checkpoint, running_xacts.  That way we will ensure both
> > aspect that the confirmed flush LSN is at the shutdown checkpoint and
> > after that there is no real activity in the system.
> >
> 
> Right.
> 
> >  I think to me,
> > this seems like the best available option so far.
> >
> 
> Yeah, let's see if someone else has a different opinion or has a better idea.

Based on the recent discussion, I made a prototype which reads all WAL records
and verifies their type. A new upgrade function binary_upgrade_validate_wal_record_types_after_lsn()
does that. This function reads WALs from start_lsn (confirmed_flush), and returns
true if they can ignore. The type of ignored records are listed in [1].

Kindly Hou found that XLOG_HEAP2_PRUNE may be generated during the pg_upgrade
--check, so it was added to acceptable type.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256B133F50DA@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Amit,

Thank you for reviewing! PSA new version patch set.

> Few comments:
> 1. Why is the FPI record (XLOG_FPI_FOR_HINT) not considered a record
> to be ignored? This can be generated during reading system tables.

Oh, I just missed. Written in comments atop the function, but not added here.
Added to white-list.

> 2.
> +binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
> {
> ...
> + if (initial_record)
> + {
> + /* Initial record must be XLOG_CHECKPOINT_SHUTDOWN */
> + if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
> +   XLOG_CHECKPOINT_SHUTDOWN))
> + result = false;
> ...
> + if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_SHUTDOWN) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_ONLINE) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID,
> XLOG_RUNNING_XACTS) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
> + result = false;
> ...
> }
> 
> Isn't it better to immediately return false if any unexpected WAL is
> found? This will avoid reading unnecessary WAL

IIUC we can exit the loop of the result == false, so we do not have to read
unnecessary WALs. See the condition below. I used the approach because
private_data and xlogreader should be pfree()'d as cleanup.

```
    /* Loop until all WALs are read, or unexpected record is found */
    while (result && ReadNextXLogRecord(xlogreader))
    {
```

> 3.
> +Datum
> +binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
> +{
> ...
> +
> + CHECK_IS_BINARY_UPGRADE;
> +
> + /* Quick exit if the given lsn is larger than current one */
> + if (start_lsn >= curr_lsn)
> + PG_RETURN_BOOL(true);
> 
> Why do you return true here? My understanding was if the first record
> is not a shutdown checkpoint record then it should fail, if that is
> not true then I think we need to explain the same in comments.

I wondered what should be because it is unexpected input for us (note that this 
unction could be used only for upgrade purpose). But yes, initially read WAL must
be XLOG_SHUTDOWN_CHECKPOINT,  so changed as you said.

Also, I did a self-reviewing again and reworded comments.

BTW, the 0002 ports some functions from pg_walinspect, it may be not elegant.
Coupling degree between core/extensions should be also lower. So I made another
patch which does not port anything and implements similar functionalities instead.
I called the patch 0003, but can be applied atop 0001 (not 0002). To make cfbot
happy, attached as txt file.
Could you please tell me which do you like 0002 or 0003?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

15 сентября 2023 г., 16:02:02

> Thank you for reviewing! PSA new version patch set.

Sorry, wrong patch attached. PSA the correct ones.
There is a possibility that XLOG_PARAMETER_CHANGE may be generated, when GUC
parameters are changed just before doing the upgrade. Added to list.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

18 сентября 2023 г., 10:03:59

On Friday, September 15, 2023 8:33 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> 
> Also, I did a self-reviewing again and reworded comments.
> 
> BTW, the 0002 ports some functions from pg_walinspect, it may be not
> elegant.
> Coupling degree between core/extensions should be also lower. So I made
> another patch which does not port anything and implements similar
> functionalities instead.
> I called the patch 0003, but can be applied atop 0001 (not 0002). To make cfbot
> happy, attached as txt file.
> Could you please tell me which do you like 0002 or 0003?

I think basically it's OK that we follow the same method as pg_walinspect to
read the WAL. The reasons are as follows:

There are currently two set of APIs that are used to read WALs.
a) XLogReaderAllocate()/XLogReadRecord() -- pg_walinspect and current patch uses
b) XLogReaderAllocate()/WALRead()

The first setup APIs is easier to use and are used in most of WAL reading
codes, while the second set of APIs is used more in low level places and is not
very easy to use. So I think it's better to use the first set of APIs.

Besides, our function needs to distinguish the failure and end-of-wal cases
when XLogReadRecord() returns NULL and to read the wal without waiting. So, the
WAL reader callbacks in pg_walinspect also meets this requirement which is reason that
I think we can follow the same. I also checked other public wal reader callbacks but
they either report ERRORs if XLogReadRecord() returns NULL or will wait while
reading wals.

If we agree to follow the same method of pg_walinspect, I think the left
thing is whether to port some functions like what 0002. I personally
think it's fine to make common functions to save codes.

Best Regards,
Hou zj

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

18 сентября 2023 г., 14:16:41

On Friday, September 15, 2023 9:02 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> Sorry, wrong patch attached. PSA the correct ones.
> There is a possibility that XLOG_PARAMETER_CHANGE may be generated,
> when GUC parameters are changed just before doing the upgrade. Added to
> list.

I did some simple performance tests for the patch just to make sure it doesn't
introduce obvious overhead, the result looks good to me. I tested two cases:

1) The time for upgrade when the old db has 0, 10,50, 100 slots
0 slots(HEAD) : 0m5.585s
0 slots : 0m5.591s
10 slots : 0m5.602s
50 slots : 0m5.636s
100 slots : 0m5.778s

2) The time for upgrade after doing "upgrade --check" in advance, when
the old db has 0, 10,50, 100 slots.

0 slots(HEAD) : 0m5.588s
0 slots : 0m5.596s
10 slots : 0m5.605s
50 slots : 0m5.737s
100 slots : 0m5.783s

The data of the local machine I used is:
CPU(s):    40
Model name:    Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
Core(s) per socket:    10
Socket(s):    2
memory:    125GB
disk:    6T HDD

The old database is empty except for the slots in both tests.

The test script is also attached for reference(run perf.sh after
adjusting other settings.)

Best Regards,
Hou zj

Вложения

perf_script.zip

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

18 сентября 2023 г., 14:49:05

On Fri, Sep 15, 2023 at 6:32 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > Thank you for reviewing! PSA new version patch set.
>
> Sorry, wrong patch attached. PSA the correct ones.
> There is a possibility that XLOG_PARAMETER_CHANGE may be generated, when GUC
> parameters are changed just before doing the upgrade. Added to list.
>

You forgot to update 0002 patch for XLOG_PARAMETER_CHANGE. I think it
is okay to move walinspect's functionality into common place so that
it can be used by this patch as suggested by Hou-San. The only reason
it is okay to keep it specific to walinspect is if we want to enhance
that functions for walinspect but I think if that happens then we can
evaluate whether to enhance it by having additional parameters or
creating something specific for walinspect.

* +Datum
+binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)

How about naming it as binary_upgrade_validate_wal_records()? I don't
see it is helpful to make it too long.

Apart from this, I have made minor cosmetic changes in the attached.
If these looks okay to you then you can include them in next version.

--
With Regards,
Amit Kapila.

Вложения

changes_amit_1.txt

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 сентября 2023 г., 09:17:45

Dear Amit,

Thank you for reviewing! PSA new version!

> > Sorry, wrong patch attached. PSA the correct ones.
> > There is a possibility that XLOG_PARAMETER_CHANGE may be generated,
> when GUC
> > parameters are changed just before doing the upgrade. Added to list.
> >
> 
> You forgot to update 0002 patch for XLOG_PARAMETER_CHANGE.

Oh, I did wrong git operations locally. Sorry for inconvenience.

> I think it
> is okay to move walinspect's functionality into common place so that
> it can be used by this patch as suggested by Hou-San. The only reason
> it is okay to keep it specific to walinspect is if we want to enhance
> that functions for walinspect but I think if that happens then we can
> evaluate whether to enhance it by having additional parameters or
> creating something specific for walinspect.

OK, merged 0001 + 0002 into one.

> * +Datum
> +binary_upgrade_validate_wal_record_types_after_lsn(PG_FUNCTION_ARGS)
> 
> How about naming it as binary_upgrade_validate_wal_records()? I don't
> see it is helpful to make it too long.

Agreed, fixed.

> Apart from this, I have made minor cosmetic changes in the attached.
> If these looks okay to you then you can include them in next version.

Seems better, included.

Apart from above, I fixed not to call binary_upgrade_validate_wal_records() during
the live check, because it raises ERROR if the server is not in the upgrade. The
result would be used only when not in the live check mode, so it's OK to skip.
Also, some comments were slightly reworded.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v39-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

19 сентября 2023 г., 15:27:31

On Tue, Sep 19, 2023 at 11:47 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit,
>
> Thank you for reviewing! PSA new version!
>

*
+#include "access/xlogdefs.h"
 #include "common/relpath.h"
 #include "libpq-fe.h"

The above include is not required. I have removed that and made a few
cosmetic changes in the attached.

--
With Regards,
Amit Kapila.

Вложения

v40-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

20 сентября 2023 г., 08:30:11

Dear Amit,

Thank you for reviewing! PSA new version. In this version I ran pgindent again.

> +#include "access/xlogdefs.h"
>  #include "common/relpath.h"
>  #include "libpq-fe.h"
> 
> The above include is not required. I have removed that and made a few
> cosmetic changes in the attached.

Yes, it is not needed anymore. Firstly it was introduced to use the datatype
XLogRecPtr, but removed in recent version.

Moreover, I my colleague Hou found several problems for v40. Here is a fixed
version. Below bullets are the found issues.

* Fixed to allow XLOG_SWICH when reading the record, including the initial one.
  The XLOG_SWICH may inserted after walsender exits. This is occurred when
  archive_mode is set to on (or always). 
* Fixed to set max_slot_wal_keep_size -1 only when the cluster is PG17+.
  max_slot_wal_keep_size was introduced in PG13, so previous patch could not
  upgrade from PG12 and prior.
  The setting is only needed to upgrade logical slots, so it should be set only
  when in PG17 and later.
* Avoid to call binary_upgrade_validate_wal_records() when the slot is invalidated.
  The function raises an ERROR if the record corresponds to the given LSN.
  The output is like:

```
ERROR:  requested WAL segment pg_wal/000000010000000000000001 has already been removed
```

  It is usual behavior but we do not want to error out here, so it was avoided.
  The upgrading would fail correctly if there are invalid slots.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v41-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

20 сентября 2023 г., 09:21:35

On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit,
>
> Thank you for reviewing! PSA new version. In this version I ran pgindent again.
>

+ /*
+ * There is a possibility that following records may be generated
+ * during the upgrade.
+ */
+ if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
+ !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
+ is_valid = false;
+
+ CHECK_FOR_INTERRUPTS();

Just wondering why XLOG_HEAP2_VACUUM or other vacuum-related commands
can not occur during the upgrade?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

20 сентября 2023 г., 09:42:26

On Wed, Sep 20, 2023 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Amit,
> >
> > Thank you for reviewing! PSA new version. In this version I ran pgindent again.
> >
>
> + /*
> + * There is a possibility that following records may be generated
> + * during the upgrade.
> + */
> + if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
> + !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
> + is_valid = false;
> +
> + CHECK_FOR_INTERRUPTS();
>
> Just wondering why XLOG_HEAP2_VACUUM or other vacuum-related commands
> can not occur during the upgrade?
>

Because autovacuum is disabled during upgrade. See comment: "Use -b to
disable autovacuum" in start_postmaster().


--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

20 сентября 2023 г., 09:46:53

On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit,

+int
+count_old_cluster_logical_slots(void)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
+ slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slot_count;
+}

In this code, aren't we assuming that 'slot_arr.nslots' will be zero
for versions <=PG16? On my Windows machine, this value is not zero but
rather some uninitialized negative value which makes its caller try to
allocate some undefined memory and fail. I think you need to
initialize this in get_old_cluster_logical_slot_infos() for lower
versions.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

20 сентября 2023 г., 11:35:18

On Wed, Sep 20, 2023 at 12:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 20, 2023 at 11:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Dear Amit,
> > >
> > > Thank you for reviewing! PSA new version. In this version I ran pgindent again.
> > >
> >
> > + /*
> > + * There is a possibility that following records may be generated
> > + * during the upgrade.
> > + */
> > + if (!CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_SHUTDOWN) &&
> > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_CHECKPOINT_ONLINE) &&
> > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_SWITCH) &&
> > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) &&
> > + !CHECK_WAL_RECORD(rmid, info, RM_XLOG_ID, XLOG_PARAMETER_CHANGE) &&
> > + !CHECK_WAL_RECORD(rmid, info, RM_STANDBY_ID, XLOG_RUNNING_XACTS) &&
> > + !CHECK_WAL_RECORD(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE))
> > + is_valid = false;
> > +
> > + CHECK_FOR_INTERRUPTS();
> >
> > Just wondering why XLOG_HEAP2_VACUUM or other vacuum-related commands
> > can not occur during the upgrade?
> >
>
> Because autovacuum is disabled during upgrade. See comment: "Use -b to
> disable autovacuum" in start_postmaster().

Okay got it, thanks.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

20 сентября 2023 г., 13:11:29

On Wed, Sep 20, 2023 at 12:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 20, 2023 at 11:00 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Amit,
>
> +int
> +count_old_cluster_logical_slots(void)
> +{
> + int dbnum;
> + int slot_count = 0;
> +
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> +
> + return slot_count;
> +}
>
> In this code, aren't we assuming that 'slot_arr.nslots' will be zero
> for versions <=PG16? On my Windows machine, this value is not zero but
> rather some uninitialized negative value which makes its caller try to
> allocate some undefined memory and fail. I think you need to
> initialize this in get_old_cluster_logical_slot_infos() for lower
> versions.
>

+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records',
+  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
+  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
+  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
+  prosrc => 'binary_upgrade_validate_wal_records' },

In this many of the fields seem bogus. For example, we don't need
prorows => '10', proretset => 't' for this function. Similarly
proargmodes also look incorrect as we don't have any out parameter.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

20 сентября 2023 г., 14:28:33

Dear Amit,

> +int
> +count_old_cluster_logical_slots(void)
> +{
> + int dbnum;
> + int slot_count = 0;
> +
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> +
> + return slot_count;
> +}
> 
> In this code, aren't we assuming that 'slot_arr.nslots' will be zero
> for versions <=PG16? On my Windows machine, this value is not zero but
> rather some uninitialized negative value which makes its caller try to
> allocate some undefined memory and fail. I think you need to
> initialize this in get_old_cluster_logical_slot_infos() for lower
> versions.

Good catch, I could not notice because it worked well in my RHEL. Here is the
updated version.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v42-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

20 сентября 2023 г., 14:30:44

Dear Amit,

Thank you for reviewing! New version can be available in [1].

> 
> +{ oid => '8046', descr => 'for use by pg_upgrade',
> +  proname => 'binary_upgrade_validate_wal_records',
> +  prorows => '10', proretset => 't', provolatile => 's', prorettype => 'bool',
> +  proargtypes => 'pg_lsn', proallargtypes => '{pg_lsn,bool}',
> +  proargmodes => '{i,o}', proargnames => '{start_lsn,is_ok}',
> +  prosrc => 'binary_upgrade_validate_wal_records' },
> 
> In this many of the fields seem bogus. For example, we don't need
> prorows => '10', proretset => 't' for this function. Similarly
> proargmodes also look incorrect as we don't have any out parameter.
>

The part was made in old versions and has kept till now. I rechecked them and
changed like below:

* This function just returns boolean, proretset was changed to 'f'.
* Based on above, prorows should be zero. Removed.
* Returned value is quite depended on the internal status, provolatile was
  changed to 'v'.
* There are no OUT and INOUT arguments, no need to set proallargtypes and proargmodes.
  Removed.
* Anonymous arguments are allowed, proargnames was removed NULL.
* This function is not expected to be call in parallel. proparallel was set to 'u'.
* The argument must not be NULL, and we should error out. proisstrict was changed 'f'.
  Also, the check was added to the function.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB586615579356A84A8CF29A00F5F9A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Michael Paquier

Дата:

21 сентября 2023 г., 10:40:00

On Wed, Sep 20, 2023 at 11:28:33AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Good catch, I could not notice because it worked well in my RHEL. Here is the
> updated version.

I am getting slowly up to date with this patch..  But before going in
depth with more review, there is something that I got to ask: why is
there no option to control if the slots are copied across the upgrade?
At least, I would have imagined that an option to disable the copy of
the slots would be adapted, say a --no-slot-copy or similar to get
back to the old behavior if need be.

+ * This is because before that the logical slots are not saved at shutdown, so
+ * there is no guarantee that the latest confirmed_flush_lsn is saved to disk

Is this comment in get_old_cluster_logical_slot_infos() still true
after e0b2eed047d?
--
Michael

Вложения

signature.asc

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

21 сентября 2023 г., 11:20:28

On Thu, Sep 21, 2023 at 1:10 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Sep 20, 2023 at 11:28:33AM +0000, Hayato Kuroda (Fujitsu) wrote:
> > Good catch, I could not notice because it worked well in my RHEL. Here is the
> > updated version.
>
> I am getting slowly up to date with this patch..  But before going in
> depth with more review, there is something that I got to ask: why is
> there no option to control if the slots are copied across the upgrade?
> At least, I would have imagined that an option to disable the copy of
> the slots would be adapted, say a --no-slot-copy or similar to get
> back to the old behavior if need be.
>

We have discussed this point. Normally, we don't have such options in
upgrade, so we were hesitent to add a new one for this but there is a
discussion to add an --exclude-logical-slots option. We are planning
to add that as a separate patch after getting some more consensus on
it. Right now, the idea is to get the main patch ready.

> + * This is because before that the logical slots are not saved at shutdown, so
> + * there is no guarantee that the latest confirmed_flush_lsn is saved to disk
>
> Is this comment in get_old_cluster_logical_slot_infos() still true
> after e0b2eed047d?
>

Yes, we didn't backpatched it, so slots from pre-17 won't be flushed
at shutdown time even if required.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

21 сентября 2023 г., 13:44:12

Dear Hackers,

> Good catch, I could not notice because it worked well in my RHEL. Here is the
> updated version.

I did some cosmetic changes for the patch, the functionality was not changed.
E.g., a macro function was replaced to an inline.

Note that cfbot got angry to old patch, but it seemed the infrastructure-side
error. Let's see again.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v43-0001-pg_upgrade-Allow-to-replicate-logical-replicati.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

21 сентября 2023 г., 14:27:44

On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Good catch, I could not notice because it worked well in my RHEL. Here is the
> updated version.

Thanks for the patch. I have some comments on v42:

1.
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',

+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

Can proisstrict => 'f' be removed so that there's no need for explicit
PG_ARGISNULL check? Any specific reason to keep it?

And, the before the ISNULL check the arg is read, which isn't good.

2.
+Datum
+binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)

The function name looks too generic in the sense that it validates WAL
records for correctness/corruption, but it is not. Can it be something
like binary_upgrade_{check_for_wal_logical_end,
check_for_logical_end_of_wal} or such?

3.
+    /* Quick exit if the given lsn is larger than current one */
+    if (start_lsn >= GetFlushRecPtr(NULL))
+        PG_RETURN_BOOL(false);
+

An LSN that doesn't exists yet is an error IMO, may be an error better here?

4.
+ * This function is used to verify that there are no WAL records (except some
+ * types) after confirmed_flush_lsn of logical slots, which means all the
+ * changes were replicated to the subscriber. There is a possibility that some
+ * WALs are inserted during upgrade, so such types would be ignored.
+ *

This comment before the function better be at the callsite of the
function, because as far as this function is concerned, it checks if
there are any WAL records that are not "certain" types after the given
LSN, it doesn't know logical slots or confirmed_flush_lsn or such.

5. Trying to understand the interaction of this feature with custom
WAL records that a custom WAL resource manager puts in. Is it okay to
have custom WAL records after the "logical WAL end"?
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */

6.
+    if (PQntuples(res) != 1)
+        pg_fatal("could not count the number of logical replication slots");
+

Not existing a single logical replication slot an error? I think it
must be if (PQntuples(res) == 0) return;?

7. A nit:
+    nslots_on_new = atoi(PQgetvalue(res, 0, 0));
+
+    if (nslots_on_new)

Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?

8.
+    if (nslots_on_new)
+        pg_fatal("expected 0 logical replication slots but found %d",
+                 nslots_on_new);

How about "New cluster database is containing logical replication
slots", note that the some of the fatal messages are starting with an
upper-case letter.

9.
+    res = executeQueryOrDie(conn, "SHOW wal_level;");
+    res = executeQueryOrDie(conn, "SHOW max_replication_slots;");

Instead of 2 queries to determine required parameters, isn't it better
with a single query like the following?

select setting from pg_settings where name in ('wal_level',
'max_replication_slots') order by name;

10.
Why just wal_level and max_replication_slots, why not
max_worker_processes and max_wal_senders too? I'm looking at
RecoveryRequiresIntParameter and if they are different on the upgraded
instance, chances that the logical replication won't work, no?

11.
+# 2. Generate extra WAL records. Because these WAL records do not get consumed
+#     it will cause the upcoming pg_upgrade test to fail.
+$old_publisher->safe_psql('postgres',
+    "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
+);
+$old_publisher->stop;

This might be a recipie for sporadic test failures - how is it
guaranteed that the newly generated WAL records aren't consumed.

May be stop subscriber or temporarily disable the subscription and
then generate WAL records?

12.
+extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
+extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
+

Why not these functions be defined in xlogreader.h with elog/ereport
in #ifndef FRONTEND #endif blocks? IMO, xlogreader.h seems right
location for these functions.

13.
+LogicalReplicationSlotInfo

Where is this structure defined?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

21 сентября 2023 г., 15:15:00

On Thu, Sep 21, 2023 at 4:57 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Sep 20, 2023 at 7:20 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Good catch, I could not notice because it worked well in my RHEL. Here is the
> > updated version.
>
> Thanks for the patch. I have some comments on v42:
>
> 1.
> +{ oid => '8046', descr => 'for use by pg_upgrade',
> +  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
> +  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
>
> +    if (PG_ARGISNULL(0))
> +        elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
>
> Can proisstrict => 'f' be removed so that there's no need for explicit
> PG_ARGISNULL check? Any specific reason to keep it?
>

Probably trying to keep it similar with
binary_upgrade_create_empty_extension(). I think it depends what
behaviour we expect for NULL input.

> And, the before the ISNULL check the arg is read, which isn't good.
>

Right.

> 2.
> +Datum
> +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
>
> The function name looks too generic in the sense that it validates WAL
> records for correctness/corruption, but it is not. Can it be something
> like binary_upgrade_{check_for_wal_logical_end,
> check_for_logical_end_of_wal} or such?
>

How about slightly modified version like
binary_upgrade_validate_wal_logical_end?

> 3.
> +    /* Quick exit if the given lsn is larger than current one */
> +    if (start_lsn >= GetFlushRecPtr(NULL))
> +        PG_RETURN_BOOL(false);
> +
>
> An LSN that doesn't exists yet is an error IMO, may be an error better here?
>

It will anyway lead to error at later point but we will provide more
information about all the slots that have invalid value of
confirmed_flush LSN.

> 4.
> + * This function is used to verify that there are no WAL records (except some
> + * types) after confirmed_flush_lsn of logical slots, which means all the
> + * changes were replicated to the subscriber. There is a possibility that some
> + * WALs are inserted during upgrade, so such types would be ignored.
> + *
>
> This comment before the function better be at the callsite of the
> function, because as far as this function is concerned, it checks if
> there are any WAL records that are not "certain" types after the given
> LSN, it doesn't know logical slots or confirmed_flush_lsn or such.
>

Yeah, we should give information at the callsite but I guess we need
to give some context atop this function as well so that it is easier
to explain the functionality.

> 5. Trying to understand the interaction of this feature with custom
> WAL records that a custom WAL resource manager puts in. Is it okay to
> have custom WAL records after the "logical WAL end"?
> +        /*
> +         * There is a possibility that following records may be generated
> +         * during the upgrade.
> +         */
>

I don't think so. The only valid records for the checks in this
function are probably the ones that can get generated by the upgrade
process because we ensure that walsender sends all the records before
it exits at shutdown time.

>
> 10.
> Why just wal_level and max_replication_slots, why not
> max_worker_processes and max_wal_senders too?

Isn't it sufficient to check the parameters that are required to
create a slot aka what we check in the function
CheckLogicalDecodingRequirements()? We are only creating logical slots
here so I think that should be sufficient.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

21 сентября 2023 г., 16:24:45

Dear Bharath,

Thank you for reviewing! Before addressing them, I would like to reply some comments.

> 6.
> +    if (PQntuples(res) != 1)
> +        pg_fatal("could not count the number of logical replication slots");
> +
> 
> Not existing a single logical replication slot an error? I think it
> must be if (PQntuples(res) == 0) return;?
>

The query executes "SELECT count(*)...", IIUC it exactly returns 1 row.

> 7. A nit:
> +    nslots_on_new = atoi(PQgetvalue(res, 0, 0));
> +
> +    if (nslots_on_new)
> 
> Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?

Note that the vaule would be used for upcoming pg_fatal. I prefer current style
because multiple atoi(PQgetvalue(res, 0, 0)) was not so beautiful.

> 
> 11.
> +# 2. Generate extra WAL records. Because these WAL records do not get
> consumed
> +#     it will cause the upcoming pg_upgrade test to fail.
> +$old_publisher->safe_psql('postgres',
> +    "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;"
> +);
> +$old_publisher->stop;
> 
> This might be a recipie for sporadic test failures - how is it
> guaranteed that the newly generated WAL records aren't consumed.

You mentioned at line 118, but at that time logical replication system is not created.
The subscriber is created at line 163.
Therefore WALs would not be consumed automatically.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Michael Paquier

Дата:

22 сентября 2023 г., 02:47:23

On Thu, Sep 21, 2023 at 01:50:28PM +0530, Amit Kapila wrote:
> We have discussed this point. Normally, we don't have such options in
> upgrade, so we were hesitent to add a new one for this but there is a
> discussion to add an --exclude-logical-slots option. We are planning
> to add that as a separate patch after getting some more consensus on
> it. Right now, the idea is to get the main patch ready.

Okay.  I am wondering if the subscriber part is OK now without an
option, but that could also be considered separately, as well.  At
least I hope so.
--
Michael

Вложения

signature.asc

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

22 сентября 2023 г., 08:26:56

On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Thanks for the patch. I have some comments on v42:
>
> Probably trying to keep it similar with
> binary_upgrade_create_empty_extension(). I think it depends what
> behaviour we expect for NULL input.

confirmed_flush_lsn for a logical slot can be null (for instance,
before confirmed_flush is updated for a newly created logical slot if
someone calls pg_stat_replication -> pg_get_replication_slots) and
when it is so, the binary_upgrade_create_empty_extension errors out.
Is this behaviour wanted? I think the function returning null on null
input is a better behaviour here.

> > 2.
> > +Datum
> > +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
> >
> > The function name looks too generic in the sense that it validates WAL
> > records for correctness/corruption, but it is not. Can it be something
> > like binary_upgrade_{check_for_wal_logical_end,
> > check_for_logical_end_of_wal} or such?
> >
>
> How about slightly modified version like
> binary_upgrade_validate_wal_logical_end?

Works for me.

> > 3.
> > +    /* Quick exit if the given lsn is larger than current one */
> > +    if (start_lsn >= GetFlushRecPtr(NULL))
> > +        PG_RETURN_BOOL(false);
> > +
> >
> > An LSN that doesn't exists yet is an error IMO, may be an error better here?
> >
>
> It will anyway lead to error at later point but we will provide more
> information about all the slots that have invalid value of
> confirmed_flush LSN.

I disagree with the function returning false for non-existing LSN.
IMO, failing fast when an LSN that doesn't exist yet is supplied to
the function is the right approach. We never know, the slots on disk
content can get corrupted for some reason and confirmed_flush_lsn is
'FFFFFFFF/FFFFFFFF' or a non-existing LSN.

> > 4.
> > + * This function is used to verify that there are no WAL records (except some
> > + * types) after confirmed_flush_lsn of logical slots, which means all the
> > + * changes were replicated to the subscriber. There is a possibility that some
> > + * WALs are inserted during upgrade, so such types would be ignored.
> > + *
> >
> > This comment before the function better be at the callsite of the
> > function, because as far as this function is concerned, it checks if
> > there are any WAL records that are not "certain" types after the given
> > LSN, it doesn't know logical slots or confirmed_flush_lsn or such.
> >
>
> Yeah, we should give information at the callsite but I guess we need
> to give some context atop this function as well so that it is easier
> to explain the functionality.

At the callsite a detailed description is good. At the function
definition just a reference to the callsite is good.

> > 5. Trying to understand the interaction of this feature with custom
> > WAL records that a custom WAL resource manager puts in. Is it okay to
> > have custom WAL records after the "logical WAL end"?
> > +        /*
> > +         * There is a possibility that following records may be generated
> > +         * during the upgrade.
> > +         */
> >
>
> I don't think so. The only valid records for the checks in this
> function are probably the ones that can get generated by the upgrade
> process because we ensure that walsender sends all the records before
> it exits at shutdown time.

Can you help me understand how the list of WAL records that pg_upgrade
can generate is put up? Identified them after running some tests?

> > 10.
> > Why just wal_level and max_replication_slots, why not
> > max_worker_processes and max_wal_senders too?
>
> Isn't it sufficient to check the parameters that are required to
> create a slot aka what we check in the function
> CheckLogicalDecodingRequirements()? We are only creating logical slots
> here so I think that should be sufficient.

Ah, that makes sense.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

22 сентября 2023 г., 09:29:14

On Thu, Sep 21, 2023 at 6:54 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > 6.
> > +    if (PQntuples(res) != 1)
> > +        pg_fatal("could not count the number of logical replication slots");
> > +
> >
> > Not existing a single logical replication slot an error? I think it
> > must be if (PQntuples(res) == 0) return;?
> >
>
> The query executes "SELECT count(*)...", IIUC it exactly returns 1 row.

Ah, got it.

> > 7. A nit:
> > +    nslots_on_new = atoi(PQgetvalue(res, 0, 0));
> > +
> > +    if (nslots_on_new)
> >
> > Just do if(atoi(PQgetvalue(res, 0, 0)) > 0) and get rid of nslots_on_new?
>
> Note that the vaule would be used for upcoming pg_fatal. I prefer current style
> because multiple atoi(PQgetvalue(res, 0, 0)) was not so beautiful.

+1.

> You mentioned at line 118, but at that time logical replication system is not created.
> The subscriber is created at line 163.
> Therefore WALs would not be consumed automatically.

So, not calling pg_logical_slot_get_changes() on test_slot1 won't
consume the WAL?

A few more comments:

1.
+    /*
+     * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+     * checkpointer process.  If WALs required by logical replication slots
+     * are removed, the slots are unusable.  This setting prevents the
+     * invalidation of slots during the upgrade. We set this option when

IIUC, during upgrade we don't want the checkpointer to remove WAL that
may be needed by logical slots, for that the patch overrides the user
set value for max_slot_wal_keep_size. What if the WAL is removed
because of the wal_keep_size setting?

2.
+++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl

How about a more descriptive and pointed name for the TAP test file,
something like 003_upgrade_logical_replication_slots.pl?

3. Does this patch support upgrading of logical replication slots on a
streaming standby? If yes, isn't it a good idea to add one test for
upgrading standby with logical replication slots?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

22 сентября 2023 г., 09:41:27

On Fri, Sep 22, 2023 at 10:57 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > 3.
> > > +    /* Quick exit if the given lsn is larger than current one */
> > > +    if (start_lsn >= GetFlushRecPtr(NULL))
> > > +        PG_RETURN_BOOL(false);
> > > +
> > >
> > > An LSN that doesn't exists yet is an error IMO, may be an error better here?
> > >
> >
> > It will anyway lead to error at later point but we will provide more
> > information about all the slots that have invalid value of
> > confirmed_flush LSN.
>
> I disagree with the function returning false for non-existing LSN.
> IMO, failing fast when an LSN that doesn't exist yet is supplied to
> the function is the right approach. We never know, the slots on disk
> content can get corrupted for some reason and confirmed_flush_lsn is
> 'FFFFFFFF/FFFFFFFF' or a non-existing LSN.
>

I don't think it is big deal to either fail immediately or slightly
later with more information about slot. It could be better if we do
later because various slots can have the same problem, so we can
mention all such slots together.

>
> > > 5. Trying to understand the interaction of this feature with custom
> > > WAL records that a custom WAL resource manager puts in. Is it okay to
> > > have custom WAL records after the "logical WAL end"?
> > > +        /*
> > > +         * There is a possibility that following records may be generated
> > > +         * during the upgrade.
> > > +         */
> > >
> >
> > I don't think so. The only valid records for the checks in this
> > function are probably the ones that can get generated by the upgrade
> > process because we ensure that walsender sends all the records before
> > it exits at shutdown time.
>
> Can you help me understand how the list of WAL records that pg_upgrade
> can generate is put up? Identified them after running some tests?
>

Yeah, both by tests and manually verifying the WAL records. Basically,
we need to care about records that could be generated by background
processes like checkpointer/bgwriter or can be generated during system
table scans. You may want to read my latest email for a summary on how
we reached at this design choice [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com
--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

22 сентября 2023 г., 09:44:38

On Fri, Sep 22, 2023 at 11:59 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Sep 21, 2023 at 6:54 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
>
> 1.
> +    /*
> +     * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
> +     * checkpointer process.  If WALs required by logical replication slots
> +     * are removed, the slots are unusable.  This setting prevents the
> +     * invalidation of slots during the upgrade. We set this option when
>
> IIUC, during upgrade we don't want the checkpointer to remove WAL that
> may be needed by logical slots, for that the patch overrides the user
> set value for max_slot_wal_keep_size. What if the WAL is removed
> because of the wal_keep_size setting?
>

We are fine with the WAL removal unless it can invalidate the slots
which is prevented by max_slot_wal_keep_size.

>
> 3. Does this patch support upgrading of logical replication slots on a
> streaming standby?
>

No, and a note has been added by the patch for the same.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

22 сентября 2023 г., 11:39:49

On Fri, Sep 22, 2023 at 10:57 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Sep 21, 2023 at 5:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Thanks for the patch. I have some comments on v42:
> >
> > Probably trying to keep it similar with
> > binary_upgrade_create_empty_extension(). I think it depends what
> > behaviour we expect for NULL input.
>
> confirmed_flush_lsn for a logical slot can be null (for instance,
> before confirmed_flush is updated for a newly created logical slot if
> someone calls pg_stat_replication -> pg_get_replication_slots) and
> when it is so, the binary_upgrade_create_empty_extension errors out.
> Is this behaviour wanted? I think the function returning null on null
> input is a better behaviour here.
>

I think if we do return null on null behavior then the caller needs to
add a special case for null value as this function returns bool. We
can probably return false in that case. Does that help to address your
concern?

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 сентября 2023 г., 07:48:19

Dear Bharath,

Again, thank you for reviewing! Here is a new version patch.

> 1.
> +{ oid => '8046', descr => 'for use by pg_upgrade',
> +  proname => 'binary_upgrade_validate_wal_records', proisstrict => 'f',
> +  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> 
> +    if (PG_ARGISNULL(0))
> +        elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
> 
> Can proisstrict => 'f' be removed so that there's no need for explicit
> PG_ARGISNULL check? Any specific reason to keep it?

Theoretically it could be, but I was not sure. I think you wanted us to follow
specs of pg_walinspect functions, but it is just a upgrade function. Normally
users cannot call it. Also, as Amit said [1], the caller must consider the
special case. Currently the function returns false at that time, we can change
more appropriate style later.

> And, the before the ISNULL check the arg is read, which isn't good.

Right, fixed.

> 2.
> +Datum
> +binary_upgrade_validate_wal_records(PG_FUNCTION_ARGS)
> 
> The function name looks too generic in the sense that it validates WAL
> records for correctness/corruption, but it is not. Can it be something
> like binary_upgrade_{check_for_wal_logical_end,
> check_for_logical_end_of_wal} or such?

Per discussion [2], changed to binary_upgrade_validate_wal_logical_end.

> 3.
> +    /* Quick exit if the given lsn is larger than current one */
> +    if (start_lsn >= GetFlushRecPtr(NULL))
> +        PG_RETURN_BOOL(false);
> +
> 
> An LSN that doesn't exists yet is an error IMO, may be an error better here?

We think that the invalid slots should be listed at the end, so basically we do
not want to error out. This would be also changed if there are better opinions.

> 4.
> + * This function is used to verify that there are no WAL records (except some
> + * types) after confirmed_flush_lsn of logical slots, which means all the
> + * changes were replicated to the subscriber. There is a possibility that some
> + * WALs are inserted during upgrade, so such types would be ignored.
> + *
> 
> This comment before the function better be at the callsite of the
> function, because as far as this function is concerned, it checks if
> there are any WAL records that are not "certain" types after the given
> LSN, it doesn't know logical slots or confirmed_flush_lsn or such.

Hmm, I think it is better to do the reverse, because otherwise we need to mention
the same explanation at other caller of the function if any. So, I have
adjusted the comments atop and at caller. Thought?

> 8.
> +    if (nslots_on_new)
> +        pg_fatal("expected 0 logical replication slots but found %d",
> +                 nslots_on_new);
> 
> How about "New cluster database is containing logical replication
> slots", note that the some of the fatal messages are starting with an
> upper-case letter.

I did not use your suggestion, but changed to upper-case.
Actually, the uppper-case rule is broken even in the file. Here I regarded
this sentence as hint message.

> 9.
> +    res = executeQueryOrDie(conn, "SHOW wal_level;");
> +    res = executeQueryOrDie(conn, "SHOW max_replication_slots;");
> 
> Instead of 2 queries to determine required parameters, isn't it better
> with a single query like the following?
> 
> select setting from pg_settings where name in ('wal_level',
> 'max_replication_slots') order by name;

Modified, but use ORDER BY ... DESC. This come from a previous comment [3].

> 
> 12.
> +extern XLogReaderState *InitXLogReaderState(XLogRecPtr lsn);
> +extern XLogRecord *ReadNextXLogRecord(XLogReaderState *xlogreader);
> +
> 
> Why not these functions be defined in xlogreader.h with elog/ereport
> in #ifndef FRONTEND #endif blocks? IMO, xlogreader.h seems right
> location for these functions.

I checked comments atop both files, and xlogreader.h seems better. Fixed.

> 13.
> +LogicalReplicationSlotInfo
> 
> Where is this structure defined?

Opps, removed.

[1]: https://www.postgresql.org/message-id/CAA4eK1LxPDeSkTttEAG2MPEWO%3D83vQe_Bja9F4QcCjVn%3DWt9rA%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1L9oJmdxprFR3oob5KLpHUnkJAt5Le4woxO3wHz-SZ%2BTA%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAA4eK1LHH_%3DwbxsEn20%3DW%2Bqz1193OqFj-vvJ-u0uHLMmwLHbRw%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v44-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 сентября 2023 г., 07:49:43

Dear Bharath,

> > You mentioned at line 118, but at that time logical replication system is not
> created.
> > The subscriber is created at line 163.
> > Therefore WALs would not be consumed automatically.
> 
> So, not calling pg_logical_slot_get_changes() on test_slot1 won't
> consume the WAL?

Yes. This slot was created manually and no one activated it automatically.
pg_logical_slot_get_changes() can consume WALs but never called.

> 
> 2.
> +++ b/src/bin/pg_upgrade/t/003_logical_replication_slots.pl
> 
> How about a more descriptive and pointed name for the TAP test file,
> something like 003_upgrade_logical_replication_slots.pl?

Good suggestion. Renamed.

> 3. Does this patch support upgrading of logical replication slots on a
> streaming standby? If yes, isn't it a good idea to add one test for
> upgrading standby with logical replication slots?

IIUC pg_upgrade would not be used for physical standby. The standby would be upgrade by:

* Recreating the database cluster, or
* Executing rsync command.

For more detail, please see the documentation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

25 сентября 2023 г., 08:45:31

On Fri, Sep 22, 2023 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Yeah, both by tests and manually verifying the WAL records. Basically,
> we need to care about records that could be generated by background
> processes like checkpointer/bgwriter or can be generated during system
> table scans. You may want to read my latest email for a summary on how
> we reached at this design choice [1].
>
> [1] -
https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com

+    /* Logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+    {

Why can't the patch allow migration of logical replication slots from
PG versions < 17 to say 17 or later? If done, it will be a main
advantage of the patch since it will enable seamless major version
upgrades of postgres database instances with logical replication
slots.

I'm looking at the changes to the postgres backend that this patch
does - AFICS, it does 2 things 1) implements
binary_upgrade_validate_wal_logical_end function, 2) adds an assertion
that the logical slots won't get invalidated. For (1), pg_upgrade can
itself can read the WAL from the old cluster to determine the logical
WAL end (i.e. implement the functionality of
binary_upgrade_validate_wal_logical_end ) because the xlogreader is
available to FRONTEND tools. For (2), it's just an assertion and
logical WAL end determining logic will anyway determine whether or not
the slots are valid; if needed, the assertion can be backported.

Is there anything else that stops this patch from supporting migration
of logical replication slots from PG versions < 17?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

25 сентября 2023 г., 10:00:07

On Mon, Sep 25, 2023 at 11:15 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Sep 22, 2023 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Yeah, both by tests and manually verifying the WAL records. Basically,
> > we need to care about records that could be generated by background
> > processes like checkpointer/bgwriter or can be generated during system
> > table scans. You may want to read my latest email for a summary on how
> > we reached at this design choice [1].
> >
> > [1] -
https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com
>
> +    /* Logical slots can be migrated since PG17. */
> +    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> +    {
>
> Why can't the patch allow migration of logical replication slots from
> PG versions < 17 to say 17 or later? If done, it will be a main
> advantage of the patch since it will enable seamless major version
> upgrades of postgres database instances with logical replication
> slots.
>
> I'm looking at the changes to the postgres backend that this patch
> does - AFICS, it does 2 things 1) implements
> binary_upgrade_validate_wal_logical_end function, 2) adds an assertion
> that the logical slots won't get invalidated. For (1), pg_upgrade can
> itself can read the WAL from the old cluster to determine the logical
> WAL end (i.e. implement the functionality of
> binary_upgrade_validate_wal_logical_end ) because the xlogreader is
> available to FRONTEND tools. For (2), it's just an assertion and
> logical WAL end determining logic will anyway determine whether or not
> the slots are valid; if needed, the assertion can be backported.
>
> Is there anything else that stops this patch from supporting migration
> of logical replication slots from PG versions < 17?

IMHO one of the main change we are doing in PG 17 is that on shutdown
checkpoint we are ensuring that if the confirmed flush lsn is updated
since the last checkpoint and that is not yet synched to the disk then
we are doing so.  I think this is the most important change otherwise
many slots for which we have already streamed all the WAL might give
an error assuming that there are pending WAL from the slots which are
not yet confirmed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

25 сентября 2023 г., 10:02:16

On Mon, Sep 25, 2023 at 12:30 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Sep 25, 2023 at 11:15 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Fri, Sep 22, 2023 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Yeah, both by tests and manually verifying the WAL records. Basically,
> > > we need to care about records that could be generated by background
> > > processes like checkpointer/bgwriter or can be generated during system
> > > table scans. You may want to read my latest email for a summary on how
> > > we reached at this design choice [1].
> > >
> > > [1] -
https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com
> >
> > +    /* Logical slots can be migrated since PG17. */
> > +    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> > +    {
> >
> > Why can't the patch allow migration of logical replication slots from
> > PG versions < 17 to say 17 or later? If done, it will be a main
> > advantage of the patch since it will enable seamless major version
> > upgrades of postgres database instances with logical replication
> > slots.
> >
> > I'm looking at the changes to the postgres backend that this patch
> > does - AFICS, it does 2 things 1) implements
> > binary_upgrade_validate_wal_logical_end function, 2) adds an assertion
> > that the logical slots won't get invalidated. For (1), pg_upgrade can
> > itself can read the WAL from the old cluster to determine the logical
> > WAL end (i.e. implement the functionality of
> > binary_upgrade_validate_wal_logical_end ) because the xlogreader is
> > available to FRONTEND tools. For (2), it's just an assertion and
> > logical WAL end determining logic will anyway determine whether or not
> > the slots are valid; if needed, the assertion can be backported.
> >
> > Is there anything else that stops this patch from supporting migration
> > of logical replication slots from PG versions < 17?
>
> IMHO one of the main change we are doing in PG 17 is that on shutdown
> checkpoint we are ensuring that if the confirmed flush lsn is updated
> since the last checkpoint and that is not yet synched to the disk then
> we are doing so.  I think this is the most important change otherwise
> many slots for which we have already streamed all the WAL might give
> an error assuming that there are pending WAL from the slots which are
> not yet confirmed.
>

You might need to refer to [1] for the change I am talking about

[1] https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

25 сентября 2023 г., 10:36:16

On Sat, Sep 23, 2023 at 10:18 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Again, thank you for reviewing! Here is a new version patch.

Here are some more comments/thoughts on the v44 patch:

1.
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_fails(
+    [

Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n",
file as well?

2.
+    'run of pg_upgrade where the new cluster has insufficient
max_replication_slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");

+    'run of pg_upgrade where the new cluster has the wrong wal_level');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");

+    'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+    "pg_upgrade_output.d/ not removed after pg_upgrade failure");

How do these tests recognize the failures are the intended ones? I
mean, for instance when pg_upgrade fails for unused replication
slots/unconsumed WAL records, then just looking at the presence of
pg_upgrade_output.d might not be sufficient, no? Using
command_fails_like instead of command_fails and looking at the
contents of invalid_logical_relication_slots.txt might help make these
tests more focused.

3.
+        pg_log(PG_REPORT, "fatal");
+        pg_fatal("Your installation contains invalid logical
replication slots.\n"
+                 "These slots can't be copied, so this cluster cannot
be upgraded.\n"
+                 "Consider removing such slots or consuming the
pending WAL if any,\n"
+                 "and then restart the upgrade.\n"
+                 "A list of invalid logical replication slots is in
the file:\n"
+                 "    %s", output_path);

It's not just the invalid logical replication slots, but also the
slots with unconsumed WALs which aren't invalid and can be upgraded if
ensured the WAL is consumed. So, a better wording would be:
        pg_fatal("Your installation contains logical replication slots
that cannot be upgraded.\n"
                 "List of all such logical replication slots is in the file:\n"
                 "These slots can't be copied, so this cluster cannot
be upgraded.\n"
                 "Consider removing invalid slots and/or consuming the
pending WAL if any,\n"
                 "and then restart the upgrade.\n"
                 "    %s", output_path);

4.
+        /*
+         * There is a possibility that following records may be generated
+         * during the upgrade.
+         */
+        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_SHUTDOWN) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_CHECKPOINT_ONLINE) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_FPI_FOR_HINT) ||
+            is_xlog_record_type(rmid, info, RM_XLOG_ID,
XLOG_PARAMETER_CHANGE) ||
+            is_xlog_record_type(rmid, info, RM_STANDBY_ID,
XLOG_RUNNING_XACTS) ||
+            is_xlog_record_type(rmid, info, RM_HEAP2_ID, XLOG_HEAP2_PRUNE);

What if we missed to capture the WAL records that may be generated
during upgrade?

What happens if a custom WAL resource manager generates table/index AM
WAL records during upgrade?

What happens if new WAL records are added that may be generated during
the upgrade? Isn't keeping this code extensible and in sync with
future changes a problem? Or we'd better say that any custom WAL
records are found after the slot's confirmed flush LSN, then the slot
isn't upgraded?

5. In continuation to the above comment:

Why can't this logic be something like - if there's any WAL record
seen after a slot's confirmed flush LSN is of type generated by WAL
resource manager having the rm_decode function defined, then the slot
can't be upgraded.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

25 сентября 2023 г., 10:53:19

On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > > Is there anything else that stops this patch from supporting migration
> > > of logical replication slots from PG versions < 17?
> >
> > IMHO one of the main change we are doing in PG 17 is that on shutdown
> > checkpoint we are ensuring that if the confirmed flush lsn is updated
> > since the last checkpoint and that is not yet synched to the disk then
> > we are doing so.  I think this is the most important change otherwise
> > many slots for which we have already streamed all the WAL might give
> > an error assuming that there are pending WAL from the slots which are
> > not yet confirmed.
> >
>
> You might need to refer to [1] for the change I am talking about
>
> [1]
https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com

I see. IIUC, without that commit e0b2eed [1], it may happen that the
slot's on-disk confirmed_flush LSN value can be higher than the WAL
LSN that's flushed to disk, no? If so, can't it be detected if the WAL
at confirmed_flush LSN is valid or not when reading WAL with
xlogreader machinery?

What if the commit e0b2eed [1] is treated to be fixing a bug with the
reasoning [2] and backpatch? When done so, it's easy to support
upgradation/migration of logical replication slots from PG versions <
17, no?

[1]
commit e0b2eed047df9045664da6f724cb42c10f8b12f0
Author: Amit Kapila <akapila@postgresql.org>
Date:   Thu Sep 14 08:56:13 2023 +0530

    Flush logical slots to disk during a shutdown checkpoint if required.

[2]
    It can also help avoid processing the same transactions again in some
    boundary cases after the clean shutdown and restart.  Say, we process
    some transactions for which we didn't send anything downstream (the
    changes got filtered) but the confirm_flush LSN is updated due to
    keepalives.  As we don't flush the latest value of confirm_flush LSN, it
    may lead to processing the same changes again without this patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

25 сентября 2023 г., 11:33:41

On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > > Is there anything else that stops this patch from supporting migration
> > > > of logical replication slots from PG versions < 17?
> > >
> > > IMHO one of the main change we are doing in PG 17 is that on shutdown
> > > checkpoint we are ensuring that if the confirmed flush lsn is updated
> > > since the last checkpoint and that is not yet synched to the disk then
> > > we are doing so.  I think this is the most important change otherwise
> > > many slots for which we have already streamed all the WAL might give
> > > an error assuming that there are pending WAL from the slots which are
> > > not yet confirmed.
> > >
> >
> > You might need to refer to [1] for the change I am talking about
> >
> > [1]
https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com
>
> I see. IIUC, without that commit e0b2eed [1], it may happen that the
> slot's on-disk confirmed_flush LSN value can be higher than the WAL
> LSN that's flushed to disk, no? If so, can't it be detected if the WAL
> at confirmed_flush LSN is valid or not when reading WAL with
> xlogreader machinery?

Actually, without this commit the slot's "confirmed_flush LSN" value
in memory can be higher than the disk because if you notice this
function LogicalConfirmReceivedLocation(), if we change only the
confirmed flush the slot is not marked dirty that means on shutdown
the slot will not be persisted to the disk.  But logically this will
not cause any issue so we can not treat it as a bug it may cause us to
process some extra records after the restart but that is not really a
bug.

> What if the commit e0b2eed [1] is treated to be fixing a bug with the
> reasoning [2] and backpatch? When done so, it's easy to support
> upgradation/migration of logical replication slots from PG versions <
> 17, no?

Maybe this could be backpatched in order to support this upgrade from
the older version but not as a bug fix.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

25 сентября 2023 г., 11:36:33

On Mon, Sep 25, 2023 at 1:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Sep 25, 2023 at 12:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > > Is there anything else that stops this patch from supporting migration
> > > > of logical replication slots from PG versions < 17?
> > >
> > > IMHO one of the main change we are doing in PG 17 is that on shutdown
> > > checkpoint we are ensuring that if the confirmed flush lsn is updated
> > > since the last checkpoint and that is not yet synched to the disk then
> > > we are doing so.  I think this is the most important change otherwise
> > > many slots for which we have already streamed all the WAL might give
> > > an error assuming that there are pending WAL from the slots which are
> > > not yet confirmed.
> > >
> >
> > You might need to refer to [1] for the change I am talking about
> >
> > [1]
https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com
>
> I see. IIUC, without that commit e0b2eed [1], it may happen that the
> slot's on-disk confirmed_flush LSN value can be higher than the WAL
> LSN that's flushed to disk, no?
>

No, without that commit, there is a very high possibility that even if
we have sent the WAL to the subscriber and got the acknowledgment of
the same, we would miss updating it before shutdown. This would lead
to upgrade failures because upgrades have no way to later identify
whether the remaining WAL records are sent to the subscriber.

> If so, can't it be detected if the WAL
> at confirmed_flush LSN is valid or not when reading WAL with
> xlogreader machinery?
>
> What if the commit e0b2eed [1] is treated to be fixing a bug with the
> reasoning [2] and backpatch? When done so, it's easy to support
> upgradation/migration of logical replication slots from PG versions <
> 17, no?
>

Yeah, we could try to make a case to backpatch it but when I raised
that point there was not much consensus on backpatching it. We are
aware and understand that if we could backpatch it then the prior
version slots be upgraded but the case to backpatch needs broader
consensus. For now, the idea is to get the core of the functionality
to be committed and then we can see if we get the consensus on
backpatching the commit you mentioned and probably changing the
version checks in this work.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

25 сентября 2023 г., 14:01:08

Dear Bharath,

Thank you for giving comments! Before addressing your comments,
I wanted to reply some of them.

> 4.
> +        /*
> +         * There is a possibility that following records may be generated
> +         * during the upgrade.
> +         */
> +        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_SHUTDOWN) ||
> +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_CHECKPOINT_ONLINE) ||
> +            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
> +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_FPI_FOR_HINT) ||
> +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> XLOG_PARAMETER_CHANGE) ||
> +            is_xlog_record_type(rmid, info, RM_STANDBY_ID,
> XLOG_RUNNING_XACTS) ||
> +            is_xlog_record_type(rmid, info, RM_HEAP2_ID,
> XLOG_HEAP2_PRUNE);
> 
> What if we missed to capture the WAL records that may be generated
> during upgrade?

If such records are generated before calling binary_upgrade_validate_wal_logical_end(),
the upgrading would fail. Otherwise it would be succeeded. Anyway, we don't care
such records because those aren't required to be replicated. The main thing we
want to detect is that we don't miss any record generated before server shutdown.

> 
> What happens if a custom WAL resource manager generates table/index AM
> WAL records during upgrade?

If such records are found, definitely we cannot distinguish whether it is acceptable.
We do not have a way to know the property of custom WALs. We didn't care as there
are other problems in the approach, if such a facility is invoked.
Please see the similar discussion [1].

> 
> What happens if new WAL records are added that may be generated during
> the upgrade? Isn't keeping this code extensible and in sync with
> future changes a problem? 

Actually, others also pointed out the similar point. Originally we just checked
confirmed_flush_lsn and "latest checkpoint lsn" reported by pg_controldata, but
found an issue what the upgrading cannot be passed if users do pg_upgrade --check
just before the actual upgrade. Then we discussed some idea but they have some
disadvantages, so we settled on the current idea. Here is a summary which
describes current situation it would be quite helpful [2]
(maybe you have already known).

> Or we'd better say that any custom WAL
> records are found after the slot's confirmed flush LSN, then the slot
> isn't upgraded?

After concluding how we ensure, we can add the sentence accordingly.


> 
> 5. In continuation to the above comment:
> 
> Why can't this logic be something like - if there's any WAL record
> seen after a slot's confirmed flush LSN is of type generated by WAL
> resource manager having the rm_decode function defined, then the slot
> can't be upgraded.

Thank you for giving new approach! We have never seen the approach before,
but at least XLOG and HEAP2 rmgr have a decode function. So that
XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and XLOG_HEAP2_PRUNE cannot
be ignored the approach, seems not appropriate.
If you have another approach, I'm very happy if you post.

[1]: https://www.postgresql.org/message-id/ZNZ4AxUMIrnMgRbo%40momjian.us
[2]: https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

26 сентября 2023 г., 07:48:40

On Monday, September 25, 2023 7:01 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> To: 'Bharath Rupireddy' <bharath.rupireddyforpostgres@gmail.com>
> Cc: Amit Kapila <amit.kapila16@gmail.com>; Dilip Kumar
> >
> > 5. In continuation to the above comment:
> >
> > Why can't this logic be something like - if there's any WAL record
> > seen after a slot's confirmed flush LSN is of type generated by WAL
> > resource manager having the rm_decode function defined, then the slot
> > can't be upgraded.
> 
> Thank you for giving new approach! We have never seen the approach before,
> but at least XLOG and HEAP2 rmgr have a decode function. So that
> XLOG_CHECKPOINT_SHUTDOWN, XLOG_CHECKPOINT_ONLINE, and
> XLOG_HEAP2_PRUNE cannot be ignored the approach, seems not appropriate.
> If you have another approach, I'm very happy if you post.

Another idea around decoding is to check if there is any decoding output for
the WAL records.

Like we can create a temp slot and use test_decoding to decode the WAL from the
confirmed_flush_lsn among existing logical replication slots. And if there is
any output from the output plugin, then we consider WAL has not been consumed
yet.

But this means we need to ignore some of the WALs like XLOG_XACT_INVALIDATIONS
which won't be decoded into the output. Also, this approach could be costly as
it needs to do the extra decoding and output, and we need to assume that "all the
WAL records including custom records will be decoded and output if they need to
be consumed" .

So it may not be better, but just share it for reference.

Best Regards,
Hou zj

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

26 сентября 2023 г., 08:21:48

Dear Bharath,

Again, thank you for reviewing! PSA a new version.

> 
> Here are some more comments/thoughts on the v44 patch:
> 
> 1.
> +# pg_upgrade will fail because the slot still has unconsumed WAL records
> +command_fails(
> +    [
> 
> Add a test case to hit fprintf(script, "The slot \"%s\" is invalid\n",
> file as well?

Added. The test was not added because 002_pg_upgrade.pl did not do similar checks,
but it is worth verifying. One difficulty was that output directory had millisecond
timestamp, so the absolute path could not be predicted. So File::Find::find was
used to detect the file.

> 2.
> +    'run of pg_upgrade where the new cluster has insufficient
> max_replication_slots');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +    "pg_upgrade_output.d/ not removed after pg_upgrade failure");
> 
> +    'run of pg_upgrade where the new cluster has the wrong wal_level');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +    "pg_upgrade_output.d/ not removed after pg_upgrade failure");
> 
> +    'run of pg_upgrade of old cluster with idle replication slots');
> +ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
> +    "pg_upgrade_output.d/ not removed after pg_upgrade failure");
> 
> How do these tests recognize the failures are the intended ones? I
> mean, for instance when pg_upgrade fails for unused replication
> slots/unconsumed WAL records, then just looking at the presence of
> pg_upgrade_output.d might not be sufficient, no? Using
> command_fails_like instead of command_fails and looking at the
> contents of invalid_logical_relication_slots.txt might help make these
> tests more focused.

Yeah, currently the output was not checked. I checked and found that pg_upgrade
would output all messages (including error message) to stdout, so
command_fails_like() could not be used. Therefore, command_checks_all() was used
instead.

> 3.
> +        pg_log(PG_REPORT, "fatal");
> +        pg_fatal("Your installation contains invalid logical
> replication slots.\n"
> +                 "These slots can't be copied, so this cluster cannot
> be upgraded.\n"
> +                 "Consider removing such slots or consuming the
> pending WAL if any,\n"
> +                 "and then restart the upgrade.\n"
> +                 "A list of invalid logical replication slots is in
> the file:\n"
> +                 "    %s", output_path);
> 
> It's not just the invalid logical replication slots, but also the
> slots with unconsumed WALs which aren't invalid and can be upgraded if
> ensured the WAL is consumed. So, a better wording would be:
>         pg_fatal("Your installation contains logical replication slots
> that cannot be upgraded.\n"
>                  "List of all such logical replication slots is in the file:\n"
>                  "These slots can't be copied, so this cluster cannot
> be upgraded.\n"
>                  "Consider removing invalid slots and/or consuming the
> pending WAL if any,\n"
>                  "and then restart the upgrade.\n"
>                  "    %s", output_path);

Fixed.

Also, I ran pgperltidy. Some formattings were changed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v45-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

26 сентября 2023 г., 14:12:57

On Tue, Sep 26, 2023 at 10:51 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Again, thank you for reviewing! PSA a new version.

Thanks for the new patch. Here's a comment on v46:

1.
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
+  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
+  proargtypes => 'pg_lsn',
+  prosrc => 'binary_upgrade_validate_wal_logical_end' },

I think this patch can avoid catalog changes by turning
binary_upgrade_validate_wal_logical_end a FRONTEND-only function
sitting in xlogreader.c after making InitXLogReaderState(),
ReadNextXLogRecord() FRONTEND-friendly (replace elog/ereport with
pg_fatal or such). With this change and back-porting of commit
e0b2eed0 to save logical slots at shutdown, the patch can help support
upgrading logical replication slots on PG versions < 17.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

27 сентября 2023 г., 07:34:16

Dear Bharath,

Thank you for reviewing!

> Thanks for the new patch. Here's a comment on v46:
> 
> 1.
> +Datum
> +binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS
> +{ oid => '8046', descr => 'for use by pg_upgrade',
> +  proname => 'binary_upgrade_validate_wal_logical_end', proisstrict => 'f',
> +  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> +  proargtypes => 'pg_lsn',
> +  prosrc => 'binary_upgrade_validate_wal_logical_end' },
> 
> I think this patch can avoid catalog changes by turning
> binary_upgrade_validate_wal_logical_end a FRONTEND-only function
> sitting in xlogreader.c after making InitXLogReaderState(),
> ReadNextXLogRecord() FRONTEND-friendly (replace elog/ereport with
> pg_fatal or such). With this change and back-porting of commit
> e0b2eed0 to save logical slots at shutdown, the patch can help support
> upgrading logical replication slots on PG versions < 17.

Hmm, I think your suggestion may be questionable.

If we implement the upgrading function as FRONTEND-only (I have not checked its
feasibility), it means pg_upgrade uses the latest version WAL reader API to read
WALs in old version cluster, which I didn't think is suggested.

Each WAL page header has a magic number, XLOG_PAGE_MAGIC, which indicates the
content of WAL. Sometimes the value has been changed due to the changes of WAL
contents, and some functions requires that the magic number must be same as
expected. E.g., startup process and pg_walinspect functions require that.
Typically XLogReaderValidatePageHeader() ensures the equality.

Now some functions are ported from pg_walinspect, so upgrading function requires
same restriction. I think we should not ease the restriction to verify the
completeness of files. Followings are the call stack of ported functions
till XLogReaderValidatePageHeader().

```
InitXLogReaderState()
XLogFindNextRecord()
ReadPageInternal()
XLogReaderValidatePageHeader()
```

```
ReadNextXLogRecord()
XLogReadRecord()
XLogReadAhead()
XLogDecodeNextRecord()
ReadPageInternal()
XLogReaderValidatePageHeader()
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

28 сентября 2023 г., 08:14:06

On Mon, Sep 25, 2023 at 2:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > [1]
https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com
> >
> > I see. IIUC, without that commit e0b2eed [1], it may happen that the
> > slot's on-disk confirmed_flush LSN value can be higher than the WAL
> > LSN that's flushed to disk, no?
> >
>
> No, without that commit, there is a very high possibility that even if
> we have sent the WAL to the subscriber and got the acknowledgment of
> the same, we would miss updating it before shutdown. This would lead
> to upgrade failures because upgrades have no way to later identify
> whether the remaining WAL records are sent to the subscriber.

Thanks for clarifying. I'm trying understand what happens without
commit e0b2eed0 with an illustration:

step 1: publisher - confirmed_flush LSN  in replication slot on disk
structure is 80
step 2: publisher - sends WAL at LSN 100
step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
step 4: publisher - shuts down without writing the new confirmed_flush
LSN as 100 to disk, note that commit e0b2eed0 is not in place
step 5: publisher - restarts
step 6: subscriber - upon publisher restart, the subscriber requests
WAL from publisher from LSN 100 as it tracks the last applied LSN in
replication origin

Now, if the pg_upgrade with the patch in this thread is run on
publisher after step 4, it complains with "The slot \"%s\" has not
consumed the WAL yet".

Is my above understanding right?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

28 сентября 2023 г., 10:36:37

On Thu, Sep 28, 2023 at 10:44 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Sep 25, 2023 at 2:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > [1]
https://www.postgresql.org/message-id/CAA4eK1%2BLtWDKXvxS7gnJ562VX%2Bs3C6%2B0uQWamqu%3DUuD8hMfORg%40mail.gmail.com
> > >
> > > I see. IIUC, without that commit e0b2eed [1], it may happen that the
> > > slot's on-disk confirmed_flush LSN value can be higher than the WAL
> > > LSN that's flushed to disk, no?
> > >
> >
> > No, without that commit, there is a very high possibility that even if
> > we have sent the WAL to the subscriber and got the acknowledgment of
> > the same, we would miss updating it before shutdown. This would lead
> > to upgrade failures because upgrades have no way to later identify
> > whether the remaining WAL records are sent to the subscriber.
>
> Thanks for clarifying. I'm trying understand what happens without
> commit e0b2eed0 with an illustration:
>
> step 1: publisher - confirmed_flush LSN  in replication slot on disk
> structure is 80
> step 2: publisher - sends WAL at LSN 100
> step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
> step 4: publisher - shuts down without writing the new confirmed_flush
> LSN as 100 to disk, note that commit e0b2eed0 is not in place
> step 5: publisher - restarts
> step 6: subscriber - upon publisher restart, the subscriber requests
> WAL from publisher from LSN 100 as it tracks the last applied LSN in
> replication origin
>
> Now, if the pg_upgrade with the patch in this thread is run on
> publisher after step 4, it complains with "The slot \"%s\" has not
> consumed the WAL yet".
>
> Is my above understanding right?
>

Yes.


--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

28 сентября 2023 г., 10:53:58

On Thu, Sep 28, 2023 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Sep 28, 2023 at 10:44 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > > No, without that commit, there is a very high possibility that even if
> > > we have sent the WAL to the subscriber and got the acknowledgment of
> > > the same, we would miss updating it before shutdown. This would lead
> > > to upgrade failures because upgrades have no way to later identify
> > > whether the remaining WAL records are sent to the subscriber.
> >
> > Thanks for clarifying. I'm trying understand what happens without
> > commit e0b2eed0 with an illustration:
> >
> > step 1: publisher - confirmed_flush LSN  in replication slot on disk
> > structure is 80
> > step 2: publisher - sends WAL at LSN 100
> > step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
> > step 4: publisher - shuts down without writing the new confirmed_flush
> > LSN as 100 to disk, note that commit e0b2eed0 is not in place
> > step 5: publisher - restarts
> > step 6: subscriber - upon publisher restart, the subscriber requests
> > WAL from publisher from LSN 100 as it tracks the last applied LSN in
> > replication origin
> >
> > Now, if the pg_upgrade with the patch in this thread is run on
> > publisher after step 4, it complains with "The slot \"%s\" has not
> > consumed the WAL yet".
> >
> > Is my above understanding right?
> >
>
> Yes.

Thanks. Trying things with replication lag - when there's a lag, the
pg_upgrade can't proceed further and it complains "The slot "mysub"
has not consumed the WAL yet".

I think the best way to upgrade a postgres instance with logical
replication slots is: 1) ensure no replication lag for the logical
slots; 2) perform pg_upgrade --check first; 3) perform pg_upgrade if
there are no complaints.

With the above understanding, it looks to me that the commit e0b2eed0
isn't necessary for back branches. Because, without it the pg_upgrade
complains "The slot "mysub" has not consumed the WAL yet", and then
the user has to restart the instance to ensure the WAL is consumed
(IOW, to get the correct confirmed_flush LSN to the disk).

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

28 сентября 2023 г., 11:52:21

On Fri, Sep 22, 2023 at 9:40 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 21, 2023 at 01:50:28PM +0530, Amit Kapila wrote:
> > We have discussed this point. Normally, we don't have such options in
> > upgrade, so we were hesitent to add a new one for this but there is a
> > discussion to add an --exclude-logical-slots option. We are planning
> > to add that as a separate patch after getting some more consensus on
> > it. Right now, the idea is to get the main patch ready.
>
> Okay.  I am wondering if the subscriber part is OK now without an
> option, but that could also be considered separately, as well.  At
> least I hope so.

+1 for an option to skip upgrade logical replication slots for the
following reasons:
- one may not want the logical replication slots on the upgraded
instance immediately - unless the upgraded instance is tested and
determined to be performant.
- one may not want the logical replication slots on the upgraded
instance immediately - no logical replication setup is wanted on the
new instance perhaps because of an architectural/organizational
decision.
- one may take backup of the postgres instance with logical
replication slots using any of the file system/snapshot based backup
mechanisms (not pg_basebackup), essentially getting the on-disk
replication slots data as well; the pg_upgrade may fail on the
backed-up instance.

I agree to have it as a 0002 patch once the design and things are
finalized for the main patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

28 сентября 2023 г., 11:57:15

On Thu, Sep 28, 2023 at 1:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Sep 28, 2023 at 1:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 28, 2023 at 10:44 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > > No, without that commit, there is a very high possibility that even if
> > > > we have sent the WAL to the subscriber and got the acknowledgment of
> > > > the same, we would miss updating it before shutdown. This would lead
> > > > to upgrade failures because upgrades have no way to later identify
> > > > whether the remaining WAL records are sent to the subscriber.
> > >
> > > Thanks for clarifying. I'm trying understand what happens without
> > > commit e0b2eed0 with an illustration:
> > >
> > > step 1: publisher - confirmed_flush LSN  in replication slot on disk
> > > structure is 80
> > > step 2: publisher - sends WAL at LSN 100
> > > step 3: subscriber - acknowledges the apply LSN or confirmed_flush LSN as 100
> > > step 4: publisher - shuts down without writing the new confirmed_flush
> > > LSN as 100 to disk, note that commit e0b2eed0 is not in place
> > > step 5: publisher - restarts
> > > step 6: subscriber - upon publisher restart, the subscriber requests
> > > WAL from publisher from LSN 100 as it tracks the last applied LSN in
> > > replication origin
> > >
> > > Now, if the pg_upgrade with the patch in this thread is run on
> > > publisher after step 4, it complains with "The slot \"%s\" has not
> > > consumed the WAL yet".
> > >
> > > Is my above understanding right?
> > >
> >
> > Yes.
>
> Thanks. Trying things with replication lag - when there's a lag, the
> pg_upgrade can't proceed further and it complains "The slot "mysub"
> has not consumed the WAL yet".
>
> I think the best way to upgrade a postgres instance with logical
> replication slots is: 1) ensure no replication lag for the logical
> slots; 2) perform pg_upgrade --check first; 3) perform pg_upgrade if
> there are no complaints.
>
> With the above understanding, it looks to me that the commit e0b2eed0
> isn't necessary for back branches. Because, without it the pg_upgrade
> complains "The slot "mysub" has not consumed the WAL yet", and then
> the user has to restart the instance to ensure the WAL is consumed
> (IOW, to get the correct confirmed_flush LSN to the disk).
>

The point is it will be difficult for users to ensure that all the WAL
is consumed because it may have already been sent even after restart
and shutdown but the check will still fail. I think the argument to
support upgrade from branches where we don't have commit e0b2eed0 has
some merits and we can change the checks if there is broader agreement
on it. Let's try to agree on whether the core patch is good as is
especially what we want to achieve via validate_wal_records. Once we
agree on the main patch and commit it, the other work including
considering having an option to upgrade slots can be done as top-up
patches.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

28 сентября 2023 г., 12:02:01

On Thu, Sep 28, 2023 at 2:22 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Sep 22, 2023 at 9:40 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Sep 21, 2023 at 01:50:28PM +0530, Amit Kapila wrote:
> > > We have discussed this point. Normally, we don't have such options in
> > > upgrade, so we were hesitent to add a new one for this but there is a
> > > discussion to add an --exclude-logical-slots option. We are planning
> > > to add that as a separate patch after getting some more consensus on
> > > it. Right now, the idea is to get the main patch ready.
> >
> > Okay.  I am wondering if the subscriber part is OK now without an
> > option, but that could also be considered separately, as well.  At
> > least I hope so.
>
> +1 for an option to skip upgrade logical replication slots for the
> following reasons:
> - one may not want the logical replication slots on the upgraded
> instance immediately - unless the upgraded instance is tested and
> determined to be performant.
> - one may not want the logical replication slots on the upgraded
> instance immediately - no logical replication setup is wanted on the
> new instance perhaps because of an architectural/organizational
> decision.
> - one may take backup of the postgres instance with logical
> replication slots using any of the file system/snapshot based backup
> mechanisms (not pg_basebackup), essentially getting the on-disk
> replication slots data as well; the pg_upgrade may fail on the
> backed-up instance.
>
> I agree to have it as a 0002 patch once the design and things are
> finalized for the main patch.
>

Thanks for understanding that it can be done as a 0002 patch because
we don't have an agreement on this. Jonathan feels exactly the
opposite for having an option that by default doesn't migrate slots as
users always need to use the option and they may want to have slots
migrated by default. So, we may consider to have an --exclude-*
option.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

28 сентября 2023 г., 12:32:21

On Mon, Sep 25, 2023 at 4:31 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > 4.
> > +        /*
> > +         * There is a possibility that following records may be generated
> > +         * during the upgrade.
> > +         */
> > +        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > XLOG_CHECKPOINT_SHUTDOWN) ||
> > +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > XLOG_CHECKPOINT_ONLINE) ||
> > +            is_xlog_record_type(rmid, info, RM_XLOG_ID, XLOG_SWITCH) ||
> > +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > XLOG_FPI_FOR_HINT) ||
> > +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > XLOG_PARAMETER_CHANGE) ||
> > +            is_xlog_record_type(rmid, info, RM_STANDBY_ID,
> > XLOG_RUNNING_XACTS) ||
> > +            is_xlog_record_type(rmid, info, RM_HEAP2_ID,
> > XLOG_HEAP2_PRUNE);
> >
> > What if we missed to capture the WAL records that may be generated
> > during upgrade?
>
> If such records are generated before calling binary_upgrade_validate_wal_logical_end(),
> the upgrading would fail. Otherwise it would be succeeded. Anyway, we don't care
> such records because those aren't required to be replicated. The main thing we
> want to detect is that we don't miss any record generated before server shutdown.

I read this https://www.postgresql.org/message-id/20230725170319.h423jbthfohwgnf7@awork3.anarazel.de
and understand that the current patch implements the approach
suggested there - "scan the end of the WAL for records that should
have been streamed out". I think the WAL records that should have been
streamed out are all WAL record types in XXXX_decode functions except
the ones that have a no-op or an op unrelated to logical decoding. For
instance,
- for xlog_decode, if the records of type {XLOG_CHECKPOINT_ONLINE,
XLOG_PARAMETER_CHANGE, XLOG_NOOP, XLOG_NEXTOID, XLOG_SWITCH,
XLOG_BACKUP_END, XLOG_RESTORE_POINT, XLOG_FPW_CHANGE,
XLOG_FPI_FOR_HINT, XLOG_FPI, XLOG_OVERWRITE_CONTRECORD} are found
after confirmed_flush LSN, it is fine.
- for xact_decode, if the records of type {XLOG_XACT_ASSIGNMENT} are
found after confirmed_flush LSN, it is fine.
- for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
- for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
- for heap2_decode, if the records of type {XLOG_HEAP2_REWRITE,
XLOG_HEAP2_FREEZE_PAGE, XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
XLOG_HEAP2_VISIBLE, XLOG_HEAP2_LOCK_UPDATED} are found after
confirmed_flush LSN, it is fine.
- for heap_decode, if the records of type {XLOG_HEAP_LOCK} are found
after confirmed_flush LSN, it is fine.

I think all of the above WAL records are okay to be present after
cofirmed_flush LSN. If any WAL records other than the above are found
after confirmed_flush LSN, those are the one that should have been
streamed out and the pg_upgrade must complain with "The slot "foo" has
not consumed the WAL yet" for all such slots, right? But, the function
binary_upgrade_validate_wal_logical_end checks for only a handful of
the above record types. I know that the list is arrived at based on
testing, but it may happen that any of the above WAL records may be
generated and present before/during/after pg_upgrade for which
pg_upgrade failure isn't wanted.

Perhaps, a function in logical/decode.c returning the WAL record as
valid if the record type is any of the above. A note in
replication/decode.h and/or access/rmgrlist.h asking rmgr adders to
categorize the WAL record type in the new function based on its
decoding operation might help with future new WAL record type
additions.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

28 сентября 2023 г., 15:38:20

On Thursday, September 28, 2023 5:32 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:

Hi,

> 
> On Mon, Sep 25, 2023 at 4:31 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > 4.
> > > +        /*
> > > +         * There is a possibility that following records may be generated
> > > +         * during the upgrade.
> > > +         */
> > > +        is_valid = is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > > XLOG_CHECKPOINT_SHUTDOWN) ||
> > > +            is_xlog_record_type(rmid, info, RM_XLOG_ID,
> > > XLOG_CHECKPOINT_ONLINE) ||
...
> > >
> > > What if we missed to capture the WAL records that may be generated
> > > during upgrade?
> >
> > If such records are generated before calling
> > binary_upgrade_validate_wal_logical_end(),
> > the upgrading would fail. Otherwise it would be succeeded. Anyway, we
> > don't care such records because those aren't required to be
> > replicated. The main thing we want to detect is that we don't miss any record
> generated before server shutdown.
> 
> I read this
> https://www.postgresql.org/message-id/20230725170319.h423jbthfohwgnf7@a
> work3.anarazel.de
> and understand that the current patch implements the approach suggested
> there - "scan the end of the WAL for records that should have been streamed
> out". I think the WAL records that should have been streamed out are all WAL
> record types in XXXX_decode functions except the ones that have a no-op or an
> op unrelated to logical decoding. For instance,
> - for xlog_decode, if the records of type {XLOG_CHECKPOINT_ONLINE,
> XLOG_PARAMETER_CHANGE, XLOG_NOOP, XLOG_NEXTOID, XLOG_SWITCH,
> XLOG_BACKUP_END, XLOG_RESTORE_POINT, XLOG_FPW_CHANGE,
> XLOG_FPI_FOR_HINT, XLOG_FPI, XLOG_OVERWRITE_CONTRECORD} are found
> after confirmed_flush LSN, it is fine.
> - for xact_decode, if the records of type {XLOG_XACT_ASSIGNMENT} are found
> after confirmed_flush LSN, it is fine.
> - for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
> XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
> - for standby_decode, if the records of type {XLOG_STANDBY_LOCK,
> XLOG_INVALIDATIONS} are found after confirmed_flush LSN, it is fine.
> - for heap2_decode, if the records of type {XLOG_HEAP2_REWRITE,
> XLOG_HEAP2_FREEZE_PAGE, XLOG_HEAP2_PRUNE, XLOG_HEAP2_VACUUM,
> XLOG_HEAP2_VISIBLE, XLOG_HEAP2_LOCK_UPDATED} are found after
> confirmed_flush LSN, it is fine.
> - for heap_decode, if the records of type {XLOG_HEAP_LOCK} are found after
> confirmed_flush LSN, it is fine.
> 
> I think all of the above WAL records are okay to be present after cofirmed_flush
> LSN. If any WAL records other than the above are found after confirmed_flush
> LSN, those are the one that should have been streamed out and the pg_upgrade
> must complain with "The slot "foo" has not consumed the WAL yet" for all such
> slots, right? But, the function binary_upgrade_validate_wal_logical_end checks
> for only a handful of the above record types. I know that the list is arrived at
> based on testing, but it may happen that any of the above WAL records may be
> generated and present before/during/after pg_upgrade for which pg_upgrade
> failure isn't wanted.
> 
> Perhaps, a function in logical/decode.c returning the WAL record as valid if the
> record type is any of the above. A note in replication/decode.h and/or
> access/rmgrlist.h asking rmgr adders to categorize the WAL record type in the
> new function based on its decoding operation might help with future new WAL
> record type additions.
> 
> Thoughts?

I think this approach can work, but I am not sure if it's better than other
approaches. Mainly because it has almost the same maintaince burden as the
current approach, i.e. we need to verify and update the check function each
time we add a new WAL record type.

Apart from the WAL scan approach, we also considered alternative approach that
do not impose an additional maintenance burden and could potentially be less
complex.  For example, we can add a new field in pg_controldata to record the
last checkpoint that happens in non-upgrade mode, so that we can compare the
slot's confirmed_flush_lsn with this value, If they are the same, the WAL
should have been consumed otherwise we disallow upgrading this slot. I would
appreciate if you can share your thought about this approach.

And if we decided to use WAL scan approach, instead of checking each record, we
could directly check if the WAL record can be decoded into meaningful results
by use test_decoding to decode them. This approach also doesn't add new
maintenance burden as we anyway need to update the test_decoding if any decode
logic for new record changes. This was also mentioned [1].

What do you think ?

[1]
https://www.postgresql.org/message-id/OS0PR01MB5716FC0F814D78E82E4CC3B894C3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hou zj

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

29 сентября 2023 г., 10:30:04

On Thu, Sep 28, 2023 at 6:08 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, September 28, 2023 5:32 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > Perhaps, a function in logical/decode.c returning the WAL record as valid if the
> > record type is any of the above. A note in replication/decode.h and/or
> > access/rmgrlist.h asking rmgr adders to categorize the WAL record type in the
> > new function based on its decoding operation might help with future new WAL
> > record type additions.
> >
> > Thoughts?
>
> I think this approach can work, but I am not sure if it's better than other
> approaches. Mainly because it has almost the same maintaince burden as the
> current approach, i.e. we need to verify and update the check function each
> time we add a new WAL record type.

I think that's not a big problem if we have comments in
replication/decode.h, access/rmgrlist.h, docs to categorize the new
WAL records as decodable. Currently, the WAL record types adders will
have to do certain things based on notes in comments or docs anyways.

Another idea to enforce categorizing decodability of WAL records is to
have a new RMGR API rm_is_record_decodable or such, the RMGR
implementers will then add respective functions returning true/false
if a given WAL record is decodable or not:
    void        (*rm_decode) (struct LogicalDecodingContext *ctx,
                              struct XLogRecordBuffer *buf);
    bool        (*rm_is_record_decodable) (uint8 type);
} RmgrData;

PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL,
NULL, NULL, xlog_is_record_decodable), then the
xlog_is_record_decodable can look something like [1].

This approach can also enforce/help custom RMGR implementers to define
the decodability of the WAL records.

> Apart from the WAL scan approach, we also considered alternative approach that
> do not impose an additional maintenance burden and could potentially be less
> complex.  For example, we can add a new field in pg_controldata to record the
> last checkpoint that happens in non-upgrade mode, so that we can compare the
> slot's confirmed_flush_lsn with this value, If they are the same, the WAL
> should have been consumed otherwise we disallow upgrading this slot. I would
> appreciate if you can share your thought about this approach.

I read this
https://www.postgresql.org/message-id/CAA4eK1JVKZGRHLOEotWi%2Be%2B09jucNedqpkkc-Do4dh5FTAU%2B5w%40mail.gmail.com
and I agree with the concern on adding a new filed in pg_controldata
just for this purpose and spreading the IsBinaryUpgrade code in
checkpointer. Another concern for me with a new filed in
pg_controldata approach is that it makes it hard to make this patch
support back branches. Therefore, -1 for this approach from me.

> And if we decided to use WAL scan approach, instead of checking each record, we
> could directly check if the WAL record can be decoded into meaningful results
> by use test_decoding to decode them. This approach also doesn't add new
> maintenance burden as we anyway need to update the test_decoding if any decode
> logic for new record changes. This was also mentioned [1].
>
> What do you think ?
>
> [1]
https://www.postgresql.org/message-id/OS0PR01MB5716FC0F814D78E82E4CC3B894C3A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-1 for decoding the WAL with test_decoding, I don't think it's a great
idea to create temp slots and launch walsenders during upgrade.

IMO, WAL scanning approach looks better. However, if were to optimize
it by not scanning WAL records for every replication slot
confirmed_flush_lsn (CFL), start with lowest CFL (min of all slots
CFL), and scan till the end of WAL. The
binary_upgrade_validate_wal_logical_end function can return an array
of LSNs at which decodable WAL records are found. Then, use CFLs of
all other slots and this array to determine if the slots have
unconsumed WAL. Following is an illustration of this idea:

1. Slots s1, s2, s3, s4, s5 with CFLs 100, 90, 110, 70, 80 respectively.
2. Min of all CFLs is 70 for slot s4.
3. Start scanning WAL from min CFL 70 for slot s4, say there are
unconsumed WAL at LSN {85, 89}.
4. Now, without scanning WAL for rest of the slots, determine if they
have unconsumed WAL.
5.1. CFL of slot s1 is 100 and no unconsumed WAL at or after LSN 100 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.2. CFL of slot s2 is 90 and no unconsumed WAL at or after LSN 90 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.3. CFL of slot s3 is 110 and no unconsumed WAL at or after LSN 110 -
look at the array of unconsumed WAL LSNs {85, 89}.
5.4. CFL of slot s4 is 70 and there's unconsumed WAL at or after LSN
70 - look at the array of unconsumed WAL LSNs {85, 89}.
5.5. CFL of slot s5 is 80 and there's unconsumed WAL at or after LSN
80 - look at the array of unconsumed WAL LSNs {85, 89}.

With this approach, the WAL is scanned only once as opposed to the
current approach the patch implements.

Thoughts?

[1]
bool
xlog_is_record_decodable(uint8 type)
{
    switch (info)
    {
        case XLOG_CHECKPOINT_SHUTDOWN:
        case XLOG_END_OF_RECOVERY:
            return true;
        case XLOG_CHECKPOINT_ONLINE:
        case XLOG_PARAMETER_CHANGE:
        case XLOG_NOOP:
        case XLOG_NEXTOID:
        case XLOG_SWITCH:
        case XLOG_BACKUP_END:
        case XLOG_RESTORE_POINT:
        case XLOG_FPW_CHANGE:
        case XLOG_FPI_FOR_HINT:
        case XLOG_FPI:
        case XLOG_OVERWRITE_CONTRECORD:
            return false;
        default:
            elog(ERROR, "unexpected RM_XLOG_ID record type: %u", info);
    }
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

29 сентября 2023 г., 13:59:19

On Fri, Sep 29, 2023 at 1:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Sep 28, 2023 at 6:08 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
>
> IMO, WAL scanning approach looks better. However, if were to optimize
> it by not scanning WAL records for every replication slot
> confirmed_flush_lsn (CFL), start with lowest CFL (min of all slots
> CFL), and scan till the end of WAL.
>

Earlier, I also thought something like that but I guess it won't
matter much as most of the slots will be up-to-date at shutdown time.
That would mean we would read just one or two records. Personally, I
feel it is better to build consensus on the WAL scanning approach,
basically, is it okay to decide as the patch is currently doing or
whether we should expose an API from the decode module as you are
proposing? OTOH, if we want to go with other approach like adding
field in pg_controldata then we don't need to deal with WAL record
types at all.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

29 сентября 2023 г., 14:57:51

Dear Bharath,

Thanks for giving your idea!

> > I think this approach can work, but I am not sure if it's better than other
> > approaches. Mainly because it has almost the same maintaince burden as the
> > current approach, i.e. we need to verify and update the check function each
> > time we add a new WAL record type.
> 
> I think that's not a big problem if we have comments in
> replication/decode.h, access/rmgrlist.h, docs to categorize the new
> WAL records as decodable. Currently, the WAL record types adders will
> have to do certain things based on notes in comments or docs anyways.
> 
> Another idea to enforce categorizing decodability of WAL records is to
> have a new RMGR API rm_is_record_decodable or such, the RMGR
> implementers will then add respective functions returning true/false
> if a given WAL record is decodable or not:
>     void        (*rm_decode) (struct LogicalDecodingContext *ctx,
>                               struct XLogRecordBuffer *buf);
>     bool        (*rm_is_record_decodable) (uint8 type);
> } RmgrData;
> 
> PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, xlog_identify, NULL,
> NULL, NULL, xlog_is_record_decodable), then the
> xlog_is_record_decodable can look something like [1].
> 
> This approach can also enforce/help custom RMGR implementers to define
> the decodability of the WAL records.

Yeah, the approach enforces developers to check the decodability.
But the benefit seems smaller than required efforts for it because the function
would be used only by pg_upgrade. Could you tell me if you have another use case
in mind? We may able to adopt if we have...
Also, this approach cannot be backported.

Anyway, let's see how senior members say.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

03 октября 2023 г., 07:28:44

On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Yeah, the approach enforces developers to check the decodability.
> But the benefit seems smaller than required efforts for it because the function
> would be used only by pg_upgrade. Could you tell me if you have another use case
> in mind? We may able to adopt if we have...

I'm attaching 0002 patch (on top of v45) which implements the new
decodable callback approach that I have in mind. IMO, this new
approach is extensible, better than the current approach (hard-coding
of certain WAL records that may be generated during pg_upgrade) taken
by the patch, and helps deal with the issue that custom WAL resource
managers can have with the current approach taken by the patch.

> Also, this approach cannot be backported.

Neither the current patch as-is. I'm not looking at backporting this
feature right now, but making it as robust and extensible as possible
for PG17.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

03 октября 2023 г., 07:42:36

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Yeah, the approach enforces developers to check the decodability.
> > But the benefit seems smaller than required efforts for it because the function
> > would be used only by pg_upgrade. Could you tell me if you have another use case
> > in mind? We may able to adopt if we have...
>
> I'm attaching 0002 patch (on top of v45) which implements the new
> decodable callback approach that I have in mind. IMO, this new
> approach is extensible, better than the current approach (hard-coding
> of certain WAL records that may be generated during pg_upgrade) taken
> by the patch, and helps deal with the issue that custom WAL resource
> managers can have with the current approach taken by the patch.

I did not see the patch, but I like this approach better.  I mean this
approach does not check what record types are generated during updagre
instead this directly targets that after the confirmed_flush_lsn what
type of records shouldn't be generated.  So if rmgr says that after
commit_flush_lsn no decodable record was generated then we are safe to
upgrade that slot.  So this seems an expandable approach.


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

03 октября 2023 г., 12:40:26

Dear Bharath,

> I'm attaching 0002 patch (on top of v45) which implements the new
> decodable callback approach that I have in mind. IMO, this new
> approach is extensible, better than the current approach (hard-coding
> of certain WAL records that may be generated during pg_upgrade) taken
> by the patch, and helps deal with the issue that custom WAL resource
> managers can have with the current approach taken by the patch.

Thanks for sharing your PoC! I tested yours and worked well. I have also made
the decoding approach locally, but your approach is conceptually faster. I think
it still checks the type one by one so not sure the acceptable, but at least
checkings are centerized. We must hear opinions from others. How do other think?
 
Comments for your patch. I attached the txt file, please include if it is OK.

1.
According to your post, we must have comments to notify developers that
is_decodable API must be implemented. Please share it too if you have idea.

 
2.
The existence of is_decodable should be checked in RegisterCustomRmgr().

3.
Anther rmgr API (rm_identify) requries uint8 without doing a bit operation:
they do "info & ~XLR_INFO_MASK" in the callbacks. Should we follow that?

4.
It is helpful for developers to add a function to test_custom_rmgrs module.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

kuroda_mod.txt

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

03 октября 2023 г., 13:09:35

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Yeah, the approach enforces developers to check the decodability.
> > But the benefit seems smaller than required efforts for it because the function
> > would be used only by pg_upgrade. Could you tell me if you have another use case
> > in mind? We may able to adopt if we have...
>
> I'm attaching 0002 patch (on top of v45) which implements the new
> decodable callback approach that I have in mind. IMO, this new
> approach is extensible, better than the current approach (hard-coding
> of certain WAL records that may be generated during pg_upgrade) taken
> by the patch, and helps deal with the issue that custom WAL resource
> managers can have with the current approach taken by the patch.
>

+xlog_is_record_decodable(uint8 info)
+{
+ switch (info)
+ {
+ case XLOG_CHECKPOINT_SHUTDOWN:
+ case XLOG_END_OF_RECOVERY:
+ return true;
+ case XLOG_CHECKPOINT_ONLINE:
+ case XLOG_PARAMETER_CHANGE:
...
+ return false;
}

I think this won't behave correctly. Without your patch, we consider
both XLOG_CHECKPOINT_SHUTDOWN and XLOG_CHECKPOINT_ONLINE as valid
records but after patch only one of these will be considered valid
which won't lead to desired behavior.

BTW, the API proposed in your patch returns the WAL record type as
valid if there is something we do for it during decoding but the check
in upgrade function expects the reverse value. For example, for WAL
record type XLOG_HEAP_INSERT, the API returns true and that is
indication to the caller that this is an expected record after
confirmed_flush LSN location which doesn't seem correct. Am I missing
something?

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

04 октября 2023 г., 04:00:34

Dear Bharath,

While checking more, I found some problems your PoC.

1. rm_is_record_decodable() returns true when WAL records are decodable.
   Based on that, should is_valid be false when the function is true?
   E.g., XLOG_HEAP_INSERT is accepted in the PoC.
2. XLOG_CHECKPOINT_SHUTDOWN and XLOG_RUNNING_XACTS should return false because
   these records may be generated during the upgrade but they are acceptable.
3. A bit operations are done for extracting a WAL type, but the mask is
   different based on the rmgr. E.g., XLOG uses XLR_INFO_MASK, but XACT uses
   XLOG_XACT_OPMASK.
4. There is a possibility that "XLOG_HEAP_INSERT | XLOG_HEAP_INIT_PAGE" is inserted,
   but it is not handled.

Regarding the 2., maybe we should say "if the reorderbuffer is modified while decoding,
rm_is_record_decodable must return false" or something. If so, the return value
of XLOG_END_OF_RECOVERY and XLOG_HEAP2_NEW_CID should be also changed.

I attached the fix patch for above. How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v2_kuroda_mod.txt

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

04 октября 2023 г., 23:18:02

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Yeah, the approach enforces developers to check the decodability.
> > But the benefit seems smaller than required efforts for it because the function
> > would be used only by pg_upgrade. Could you tell me if you have another use case
> > in mind? We may able to adopt if we have...
>
> I'm attaching 0002 patch (on top of v45) which implements the new
> decodable callback approach that I have in mind. IMO, this new
> approach is extensible, better than the current approach (hard-coding
> of certain WAL records that may be generated during pg_upgrade) taken
> by the patch, and helps deal with the issue that custom WAL resource
> managers can have with the current approach taken by the patch.
>

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream. In this new API,
we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

The other potential problem Andres pointed out is that during shutdown
if due to some reason, the walreceiver goes down, we won't be able to
send the required WAL and users won't be able to ensure that because
even after restart the same situation can happen. The ideal way is to
have something that puts the system in READ ONLY state during shutdown
and then we can probably allow walreceivers to reconnect and receive
the required WALs. As we don't have such functionality available and
it won't be easy to achieve the same, we can leave this for now.

Thoughts?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Dilip Kumar

Дата:

05 октября 2023 г., 11:58:53

On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Yeah, the approach enforces developers to check the decodability.
> > > But the benefit seems smaller than required efforts for it because the function
> > > would be used only by pg_upgrade. Could you tell me if you have another use case
> > > in mind? We may able to adopt if we have...
> >
> > I'm attaching 0002 patch (on top of v45) which implements the new
> > decodable callback approach that I have in mind. IMO, this new
> > approach is extensible, better than the current approach (hard-coding
> > of certain WAL records that may be generated during pg_upgrade) taken
> > by the patch, and helps deal with the issue that custom WAL resource
> > managers can have with the current approach taken by the patch.
> >
>
> Today, I discussed this problem with Andres at PGConf NYC and he
> suggested as following. To verify, if there is any pending unexpected
> WAL after shutdown, we can have an API like
> pg_logical_replication_slot_advance() which will simply process
> records without actually sending anything downstream.

So I assume in each lower-level decode function (e.g. heap_decode() )
we will add the check that if we are checking the WAL for an upgrade
then from that level we will return true or false based on whether the
WAL is decodable or not.  Is my understanding correct?  At first
thought this approach look better and generic.

 In this new API,
> we will start with each slot's restart_lsn location and try to process
> till the end of WAL, if we encounter any WAL that needs to be
> processed (like we need to send the decoded WAL downstream) we can
> return a false indicating that there is an unexpected WAL. The reason
> to start with restart_lsn is that it is the location that we use to
> start scanning the WAL anyway.

Yeah, that makes sense.

> Then, we should also try to create slots before invoking pg_resetwal.
> The idea is that we can write a new binary mode function that will do
> exactly what pg_resetwal does to compute the next segment and use that
> location as a new location (restart_lsn) to create the slots in a new
> node. Then, pass it pg_resetwal by using the existing option '-l
> walfile'. As we don't have any API that takes restart_lsn as input, we
> can write a new API probably for binary mode to create slots that do
> take restart_lsn as input. This will ensure that there is no new WAL
> inserted by background processes between resetwal and the creation of
> slots.

Yeah, that looks cleaner IMHO.

> The other potential problem Andres pointed out is that during shutdown
> if due to some reason, the walreceiver goes down, we won't be able to
> send the required WAL and users won't be able to ensure that because
> even after restart the same situation can happen. The ideal way is to
> have something that puts the system in READ ONLY state during shutdown
> and then we can probably allow walreceivers to reconnect and receive
> the required WALs. As we don't have such functionality available and
> it won't be easy to achieve the same, we can leave this for now.

+1

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

05 октября 2023 г., 13:06:43

Dear Amit, Andres,

Thank you for giving the decision! Basically I will follow your idea and make
a patch accordingly.

> Today, I discussed this problem with Andres at PGConf NYC and he
> suggested as following. To verify, if there is any pending unexpected
> WAL after shutdown, we can have an API like
> pg_logical_replication_slot_advance() which will simply process
> records without actually sending anything downstream. In this new API,
> we will start with each slot's restart_lsn location and try to process
> till the end of WAL, if we encounter any WAL that needs to be
> processed (like we need to send the decoded WAL downstream) we can
> return a false indicating that there is an unexpected WAL. The reason
> to start with restart_lsn is that it is the location that we use to
> start scanning the WAL anyway.

I felt the approach seems similar to Hou-san's suggestion[1], but we can avoid to
use test_decoding. I'm planning to do that the upgrading function decodes WALs
and check whether there are reorderbuffer changes.

> Then, we should also try to create slots before invoking pg_resetwal.
> The idea is that we can write a new binary mode function that will do
> exactly what pg_resetwal does to compute the next segment and use that
> location as a new location (restart_lsn) to create the slots in a new
> node. Then, pass it pg_resetwal by using the existing option '-l
> walfile'. As we don't have any API that takes restart_lsn as input, we
> can write a new API probably for binary mode to create slots that do
> take restart_lsn as input. This will ensure that there is no new WAL
> inserted by background processes between resetwal and the creation of
> slots.

It seems better because we can create every objects before pg_resetwal.

I will handle above two points and let's see how it work.

[1]:
https://www.postgresql.org/message-id/OS0PR01MB5716506A1A1B20EFBFA7B52994C1A%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

05 октября 2023 г., 13:54:20

On Thu, Oct 5, 2023 at 2:29 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
> > > <kuroda.hayato@fujitsu.com> wrote:
> > > >
> > > > Yeah, the approach enforces developers to check the decodability.
> > > > But the benefit seems smaller than required efforts for it because the function
> > > > would be used only by pg_upgrade. Could you tell me if you have another use case
> > > > in mind? We may able to adopt if we have...
> > >
> > > I'm attaching 0002 patch (on top of v45) which implements the new
> > > decodable callback approach that I have in mind. IMO, this new
> > > approach is extensible, better than the current approach (hard-coding
> > > of certain WAL records that may be generated during pg_upgrade) taken
> > > by the patch, and helps deal with the issue that custom WAL resource
> > > managers can have with the current approach taken by the patch.
> > >
> >
> > Today, I discussed this problem with Andres at PGConf NYC and he
> > suggested as following. To verify, if there is any pending unexpected
> > WAL after shutdown, we can have an API like
> > pg_logical_replication_slot_advance() which will simply process
> > records without actually sending anything downstream.
>
> So I assume in each lower-level decode function (e.g. heap_decode() )
> we will add the check that if we are checking the WAL for an upgrade
> then from that level we will return true or false based on whether the
> WAL is decodable or not. Is my understanding correct?
>

Yes, this is one way to achive but I think this will require changing return value of many APIs. Can we somehow just get this via LogicalDecodingContext or some other way at the caller by allowing to set some variable at required places?

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

05 октября 2023 г., 14:56:17

On Thu, Oct 5, 2023 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > Today, I discussed this problem with Andres at PGConf NYC and he
> > > suggested as following. To verify, if there is any pending unexpected
> > > WAL after shutdown, we can have an API like
> > > pg_logical_replication_slot_advance() which will simply process
> > > records without actually sending anything downstream.

+1 for this approach. It looks neat.

I think we also need to add TAP tests to generate decodable WAL
records (RUNNING_XACT, CHECKPOINT_ONLINE, XLOG_FPI_FOR_HINT,
XLOG_SWITCH, XLOG_PARAMETER_CHANGE, XLOG_HEAP2_PRUNE) during
pg_upgrade as described here
https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com.
Basically, these were the exceptional WAL records that may be
generated by pg_upgrade, so having tests for them is good.

> > So I assume in each lower-level decode function (e.g. heap_decode() )
> > we will add the check that if we are checking the WAL for an upgrade
> > then from that level we will return true or false based on whether the
> > WAL is decodable or not.  Is my understanding correct?
> >
>
> Yes, this is one way to achive but I think this will require changing return value of many APIs. Can we somehow just
getthis via LogicalDecodingContext or some other way at the caller by allowing to set some variable at required places? 

+1 for adding the required flags to the decoding context similar to
fast_forward.

Another way without adding any new variables is to pass the WAL record
to LogicalDecodingProcessRecord, and upon return check the reorder
buffer if there's any decoded change generated for the xid associated
with the WAL record. If any decoded change related to the WAL record
xid is found, then that's the end for the new function. Here's what I
think [1], haven't tested it.

[1]
change_found = false;
end_of_wal = false;
ctx = CreateDecodingContext();

XLogBeginRead(ctx->reader, MyReplicationSlot->data.restart_lsn);

while(!end_of_wal || !change_found)
{
    XLogRecord *record;
    TransactionId xid;
    ReorderBufferTXN *txn;

    record = XLogReadRecord(ctx->reader, &errm);

    if (record)
        LogicalDecodingProcessRecord(ctx, ctx->reader);

    xid = XLogRecGetXid(record);

    txn = ReorderBufferTXNByXid(ctx->reorder, xid, false, NULL,
InvalidXLogRecPtr,
                                false);

    if (txn != NULL)
    {
        change_found = true;
        break;
    }

    CHECK_FOR_INTERRUPTS();
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

05 октября 2023 г., 16:13:30

On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Then, we should also try to create slots before invoking pg_resetwal.
> The idea is that we can write a new binary mode function that will do
> exactly what pg_resetwal does to compute the next segment and use that
> location as a new location (restart_lsn) to create the slots in a new
> node. Then, pass it pg_resetwal by using the existing option '-l
> walfile'. As we don't have any API that takes restart_lsn as input, we
> can write a new API probably for binary mode to create slots that do
> take restart_lsn as input. This will ensure that there is no new WAL
> inserted by background processes between resetwal and the creation of
> slots.

+1. I think this approach makes it foolproof. pg_resetwal uses
FindEndOfXLOG and we need that to be in a binary mode SQL callable
function. FindEndOfXLOG ignores TLI to compute the new WAL file name,
but that seems to be okay for the new binary mode function because
pg_upgrade uses TLI 1 anyways and doesn't copy WAL files from old
cluster.

FWIW, pg_upgrades does use -l in copy_xact_xlog_xid, I'm not sure if
it has anything to do with the above proposed change.

> The other potential problem Andres pointed out is that during shutdown
> if due to some reason, the walreceiver goes down, we won't be able to
> send the required WAL and users won't be able to ensure that because
> even after restart the same situation can happen. The ideal way is to
> have something that puts the system in READ ONLY state during shutdown
> and then we can probably allow walreceivers to reconnect and receive
> the required WALs. As we don't have such functionality available and
> it won't be easy to achieve the same, we can leave this for now.
>
> Thoughts?

You mean walreceiver for streaming replication? Or the apply workers
going down for logical replication? If there's yet-to-be-sent-out WAL,
pg_upgrade will fail no? How does the above scenario a problem for
pg_upgrade of a cluster with just logical replication slots?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)

От

"Hayato Kuroda (Fujitsu)"

Дата:

06 октября 2023 г., 16:00:13

Dear hackers,

Based on comments, I revised my patch. PSA the file.

> 
> > Today, I discussed this problem with Andres at PGConf NYC and he
> > suggested as following. To verify, if there is any pending unexpected
> > WAL after shutdown, we can have an API like
> > pg_logical_replication_slot_advance() which will simply process
> > records without actually sending anything downstream. In this new API,
> > we will start with each slot's restart_lsn location and try to process
> > till the end of WAL, if we encounter any WAL that needs to be
> > processed (like we need to send the decoded WAL downstream) we can
> > return a false indicating that there is an unexpected WAL. The reason
> > to start with restart_lsn is that it is the location that we use to
> > start scanning the WAL anyway.

I implemented this by using decoding context. The binary upgrade function
processes WALs from the confirmed_flush, and returns false if some meaningful
changes are found.

Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
If the decoding context is in the mode, the output plugin is not loaded, but
any WALs are decoded without skipping. Also, a new flag "did_process" is also
added. This flag is set if wrappers for output plugin callbacks are called during
the silent mode. The upgrading function checks both reorder buffer and the new
flag because both (non-)transactional changes should be detected. If we only
check reorder buffer, we miss the non-transactional one.

fast_forward was changed as a variant of decoding mode.

Currently the function is called for all the valid slot. If the approach seems
good, we can refactor like Bharath said [1].

> 
> > Then, we should also try to create slots before invoking pg_resetwal.
> > The idea is that we can write a new binary mode function that will do
> > exactly what pg_resetwal does to compute the next segment and use that
> > location as a new location (restart_lsn) to create the slots in a new
> > node. Then, pass it pg_resetwal by using the existing option '-l
> > walfile'. As we don't have any API that takes restart_lsn as input, we
> > can write a new API probably for binary mode to create slots that do
> > take restart_lsn as input. This will ensure that there is no new WAL
> > inserted by background processes between resetwal and the creation of
> > slots.

Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
This function is similar to pg_create_logical_replication_slot(), but the
restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
filename is returned and it is passed to pg_resetwal command.

One consideration is that pg_log_standby_snapshot() must be executed before
slots consuming changes. New cluster does not have RUNNING_XACTS records so that
decoding context on new cluster cannot be create a consistent snapshot as-is.
This may lead to discard changes during the upcoming consuming event. To
prevent it the function is called after the final pg_resetwal.

How do you think?

Acknowledgment: I would like to thank Hou for discussing with me.

[1]: https://www.postgresql.org/message-id/CALj2ACWAdYxgzOpXrP%3DJMiOaWtAT2VjPiKw7ryGbipkSkocJ%3Dg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v46-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)

От

Amit Kapila

Дата:

07 октября 2023 г., 01:16:18

On Fri, Oct 6, 2023 at 6:30 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Based on comments, I revised my patch. PSA the file.
>
> >
> > > Today, I discussed this problem with Andres at PGConf NYC and he
> > > suggested as following. To verify, if there is any pending unexpected
> > > WAL after shutdown, we can have an API like
> > > pg_logical_replication_slot_advance() which will simply process
> > > records without actually sending anything downstream. In this new API,
> > > we will start with each slot's restart_lsn location and try to process
> > > till the end of WAL, if we encounter any WAL that needs to be
> > > processed (like we need to send the decoded WAL downstream) we can
> > > return a false indicating that there is an unexpected WAL. The reason
> > > to start with restart_lsn is that it is the location that we use to
> > > start scanning the WAL anyway.
>
> I implemented this by using decoding context. The binary upgrade function
> processes WALs from the confirmed_flush, and returns false if some meaningful
> changes are found.
>
> Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
> If the decoding context is in the mode, the output plugin is not loaded, but
> any WALs are decoded without skipping.
>

I think it may be okay not to load the output plugin as we are not
going to process any record in this case but is that the only reason
or you have something else in mind as well?

> Also, a new flag "did_process" is also
> added. This flag is set if wrappers for output plugin callbacks are called during
> the silent mode.
>

Isn't it sufficient to add a test for silent mode in
begin/stream_start/begin_prepare kind of APIs and set
ctx->did_process? In all other APIs, we can assert that did_process
shouldn't be set and we never reach there when decoding mode is
silent.

> The upgrading function checks both reorder buffer and the new
> flag because both (non-)transactional changes should be detected. If we only
> check reorder buffer, we miss the non-transactional one.
>

+ /* Check whether the meaningful change was found */
+ found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
+ ctx->did_process);

Are you talking about this check in the patch? If so, can you please
explain when does the first check help?

> fast_forward was changed as a variant of decoding mode.
>
> Currently the function is called for all the valid slot. If the approach seems
> good, we can refactor like Bharath said [1].
>
> >
> > > Then, we should also try to create slots before invoking pg_resetwal.
> > > The idea is that we can write a new binary mode function that will do
> > > exactly what pg_resetwal does to compute the next segment and use that
> > > location as a new location (restart_lsn) to create the slots in a new
> > > node. Then, pass it pg_resetwal by using the existing option '-l
> > > walfile'. As we don't have any API that takes restart_lsn as input, we
> > > can write a new API probably for binary mode to create slots that do
> > > take restart_lsn as input. This will ensure that there is no new WAL
> > > inserted by background processes between resetwal and the creation of
> > > slots.
>
> Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
> This function is similar to pg_create_logical_replication_slot(), but the
> restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> filename is returned and it is passed to pg_resetwal command.
>

I am not sure if it is a good idea that a
binary_upgrade_create_logical_replication_slot() API does the logfile
name calculation.

> One consideration is that pg_log_standby_snapshot() must be executed before
> slots consuming changes. New cluster does not have RUNNING_XACTS records so that
> decoding context on new cluster cannot be create a consistent snapshot as-is.
> This may lead to discard changes during the upcoming consuming event. To
> prevent it the function is called after the final pg_resetwal.
>
> How do you think?
>

+ /*
+ * Also, we mu execute pg_log_standby_snapshot() when logical replication
+ * slots are migrated. Because RUNNING_XACTS record is required to create
+ * a consistent snapshot.
+ */
+ if (count_old_cluster_logical_slots())
+ create_consistent_snapshot();

We shouldn't do this separately. Instead
binary_upgrade_create_logical_replication_slot() should ensure that
corresponding WAL is reserved similar to what we do in
ReplicationSlotReserveWal() and then similarly invoke
LogStandbySnapshot() to ensure that we have enough information to
start.

Few minor comments:
==================
1. The commit message and other comments like atop
get_old_cluster_logical_slot_infos() needs to be adjusted as per
recent changes.
2.
@@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache,
ReorderBufferTXN *txn,
  LogicalErrorCallbackState state;
  ErrorContextCallback errcallback;

- Assert(!ctx->fast_forward);
+ /*
+ * In silent mode all the two-phase callbacks are not set so that the
+ * wrapper should not be called.
+ */
+ Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);

This and other similar comments doesn't seems to be consistent as the
function name and comments are not matching.

With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 октября 2023 г., 03:09:34

On Thu, Oct 5, 2023 at 6:43 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Oct 5, 2023 at 1:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > The other potential problem Andres pointed out is that during shutdown
> > if due to some reason, the walreceiver goes down, we won't be able to
> > send the required WAL and users won't be able to ensure that because
> > even after restart the same situation can happen. The ideal way is to
> > have something that puts the system in READ ONLY state during shutdown
> > and then we can probably allow walreceivers to reconnect and receive
> > the required WALs. As we don't have such functionality available and
> > it won't be easy to achieve the same, we can leave this for now.
> >
> > Thoughts?
>
> You mean walreceiver for streaming replication? Or the apply workers
> going down for logical replication?
>

Apply workers.

>
> If there's yet-to-be-sent-out WAL,
> pg_upgrade will fail no? How does the above scenario a problem for
> pg_upgrade of a cluster with just logical replication slots?
>

Even, if there is a WAL yet to be sent, the walsender will simply exit
as it will receive PqMsg_Terminate ('X') from standby. See
ProcessRepliesIfAny(). After that shutdown checkpoint will finish. So,
in this case upgrade can fail due to slots. But, I think the server
should be able to succeed in consecutive runs. Does this make sense?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher nodeHayato Kuroda (Fujitsu)

От

vignesh C

Дата:

09 октября 2023 г., 11:59:23

On Fri, 6 Oct 2023 at 18:30, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> Based on comments, I revised my patch. PSA the file.
>
> >
> > > Today, I discussed this problem with Andres at PGConf NYC and he
> > > suggested as following. To verify, if there is any pending unexpected
> > > WAL after shutdown, we can have an API like
> > > pg_logical_replication_slot_advance() which will simply process
> > > records without actually sending anything downstream. In this new API,
> > > we will start with each slot's restart_lsn location and try to process
> > > till the end of WAL, if we encounter any WAL that needs to be
> > > processed (like we need to send the decoded WAL downstream) we can
> > > return a false indicating that there is an unexpected WAL. The reason
> > > to start with restart_lsn is that it is the location that we use to
> > > start scanning the WAL anyway.
>
> I implemented this by using decoding context. The binary upgrade function
> processes WALs from the confirmed_flush, and returns false if some meaningful
> changes are found.
>
> Internally, I added a new decoding mode - DECODING_MODE_SILENT - and used it.
> If the decoding context is in the mode, the output plugin is not loaded, but
> any WALs are decoded without skipping. Also, a new flag "did_process" is also
> added. This flag is set if wrappers for output plugin callbacks are called during
> the silent mode. The upgrading function checks both reorder buffer and the new
> flag because both (non-)transactional changes should be detected. If we only
> check reorder buffer, we miss the non-transactional one.
>
> fast_forward was changed as a variant of decoding mode.
>
> Currently the function is called for all the valid slot. If the approach seems
> good, we can refactor like Bharath said [1].
>
> >
> > > Then, we should also try to create slots before invoking pg_resetwal.
> > > The idea is that we can write a new binary mode function that will do
> > > exactly what pg_resetwal does to compute the next segment and use that
> > > location as a new location (restart_lsn) to create the slots in a new
> > > node. Then, pass it pg_resetwal by using the existing option '-l
> > > walfile'. As we don't have any API that takes restart_lsn as input, we
> > > can write a new API probably for binary mode to create slots that do
> > > take restart_lsn as input. This will ensure that there is no new WAL
> > > inserted by background processes between resetwal and the creation of
> > > slots.
>
> Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
> This function is similar to pg_create_logical_replication_slot(), but the
> restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> filename is returned and it is passed to pg_resetwal command.
>
> One consideration is that pg_log_standby_snapshot() must be executed before
> slots consuming changes. New cluster does not have RUNNING_XACTS records so that
> decoding context on new cluster cannot be create a consistent snapshot as-is.
> This may lead to discard changes during the upcoming consuming event. To
> prevent it the function is called after the final pg_resetwal.

Few comments:
1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
this funcion too:
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */
+       bool            two_phase = PG_GETARG_BOOL(2);

2) Generally we are specifying the slot name in this case, is slot
name null check required:
+Datum
+binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
+{
+       Name            slot_name;
+       XLogRecPtr      end_of_wal;
+       LogicalDecodingContext *ctx = NULL;
+       bool            has_record;
+
+       CHECK_IS_BINARY_UPGRADE;
+
+       /* Quick exit if the input is NULL */
+       if (PG_ARGISNULL(0))
+               PG_RETURN_BOOL(false);

3) Since this is similar to pg_create_logical_replication_slot, can we
add a comment saying any change in pg_create_logical_replication_slot
would also need the same check to be added in
binary_upgrade_create_logical_replication_slot:
+/*
+ * SQL function for creating a new logical replication slot.
+ *
+ * This function is almost same as pg_create_logical_replication_slot(), but
+ * this can specify the restart_lsn.
+ */
+Datum
+binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
+{
+       Name            name = PG_GETARG_NAME(0);
+       Name            plugin = PG_GETARG_NAME(1);
+
+       /* Temporary slots is never handled in this function */

4) Any conclusion on this try catch comment, do you want to add which
setting you want to revert in catch, if try/catch is not required we
can remove this comment:
+       ReplicationSlotAcquire(NameStr(*slot_name), true);
+
+       /* XXX: Is PG_TRY/CATCH needed around here? */
+
+       /*
+        * We use silent mode here to decode all changes without
outputting them,
+        * allowing us to detect all the records that could be sent downstream.
+        */

5) I felt these 2 comments can be combined as both are trying to say
the same thing:
+ * This is a special purpose function to ensure that there are no WAL records
+ * pending to be decoded after the given LSN.
+ *
+ * It is used to ensure that there is no pending WAL to be consumed for
+ * the logical slots.

6) I feel this memset is not required as we are initializing at the
beginning of function, if you want to keep the memset, the
initialization can be removed:
+       values[2] = CStringGetTextDatum(xlogfilename);
+
+       memset(nulls, 0, sizeof(nulls));
+
+       tuple = heap_form_tuple(tupdesc, values, nulls);

7) looks like a typo, "mu" should be "must":
+       /*
+        * Also, we mu execute pg_log_standby_snapshot() when logical
replication
+        * slots are migrated. Because RUNNING_XACTS record is
required to create
+        * a consistent snapshot.
+        */
+       if (count_old_cluster_logical_slots())
+               create_consistent_snapshot();

8) consitent should be consistent:
+/*
+ * Log the details of the current snapshot to the WAL, allowing the snapshot
+ * state to be reconstructed for logical decoding on the upgraded slots.
+ */
+static void
+create_consistent_snapshot(void)
+{
+       DbInfo     *old_db = &old_cluster.dbarr.dbs[0];
+       PGconn     *conn;
+
+       prep_status("Creating a consitent snapshot on new cluster");

Regards,
Vignesh

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

10 октября 2023 г., 10:40:38

On Sat, Oct 7, 2023 at 3:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 6, 2023 at 6:30 PM Hayato Kuroda (Fujitsu)
> >
> > Based on that, I added another binary function binary_upgrade_create_logical_replication_slot().
> > This function is similar to pg_create_logical_replication_slot(), but the
> > restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> > filename is returned and it is passed to pg_resetwal command.
> >
>
> I am not sure if it is a good idea that a
> binary_upgrade_create_logical_replication_slot() API does the logfile
> name calculation.
>

The other problem is that pg_resetwal removes all pre-existing WAL
files which in this case could lead to the removal of the WAL file
corresponding to restart_lsn. This is because at least the shutdown
checkpoint record will be written after the creation of slots which
could be in the new file used for restart_lsn. Then when we invoke
pg_resetwal, it can remove that file.

One idea to deal with this could be to do the reset WAL stuff
(FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
create slots. If we do this, then we additionally need an option in
pg_resetwal which skips resetting the WAL as that would have been done
before creating the slots.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

10 октября 2023 г., 14:21:23

Dear Amit,

Thank you for reviewing! PSA new version.

> > Internally, I added a new decoding mode - DECODING_MODE_SILENT - and
> used it.
> > If the decoding context is in the mode, the output plugin is not loaded, but
> > any WALs are decoded without skipping.
> >
> 
> I think it may be okay not to load the output plugin as we are not
> going to process any record in this case but is that the only reason
> or you have something else in mind as well?

My main concern was for skipping to set output plugin options. Even if the
pgoutput plugin, some options like protocol_version, publications, etc are
required while loading a plugin. We cannot predict requirements for external
plugins. Based on that I thought output plugins should not be loaded during the
decode.

> > Also, a new flag "did_process" is also
> > added. This flag is set if wrappers for output plugin callbacks are called during
> > the silent mode.
>
> Isn't it sufficient to add a test for silent mode in
> begin/stream_start/begin_prepare kind of APIs and set
> ctx->did_process? In all other APIs, we can assert that did_process
> shouldn't be set and we never reach there when decoding mode is
> silent.
>
> 
> + /* Check whether the meaningful change was found */
> + found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
> + ctx->did_process);
> 
> Are you talking about this check in the patch? If so, can you please
> explain when does the first check help?

I changed around here so I describe once again.

A flag (output_skipped) is set when the transaction is decoded till the end in
silent mode. It is done in DecodeTXNNeedSkip() because the function is the common
path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() returns
true when the decoding context is in the silent mode. Therefore, any cb_wrapper
functions would not be called anymore. DecodingContextHasdecodedItems() just
returns output_skipped.

This approach needs to read WALs till end of transactions before returning the
upgrading function, but codes look simpler than the previous version.

> >
> > Based on that, I added another binary function
> binary_upgrade_create_logical_replication_slot().
> > This function is similar to pg_create_logical_replication_slot(), but the
> > restart_lsn and confirmed_flush are set to *next* WAL segment. The pointed
> > filename is returned and it is passed to pg_resetwal command.
> >
> 
> I am not sure if it is a good idea that a
> binary_upgrade_create_logical_replication_slot() API does the logfile
> name calculation.
> 
> > One consideration is that pg_log_standby_snapshot() must be executed before
> > slots consuming changes. New cluster does not have RUNNING_XACTS records
> so that
> > decoding context on new cluster cannot be create a consistent snapshot as-is.
> > This may lead to discard changes during the upcoming consuming event. To
> > prevent it the function is called after the final pg_resetwal.
> >
> > How do you think?
> >
> 
> + /*
> + * Also, we mu execute pg_log_standby_snapshot() when logical replication
> + * slots are migrated. Because RUNNING_XACTS record is required to create
> + * a consistent snapshot.
> + */
> + if (count_old_cluster_logical_slots())
> + create_consistent_snapshot();
> 
> We shouldn't do this separately. Instead
> binary_upgrade_create_logical_replication_slot() should ensure that
> corresponding WAL is reserved similar to what we do in
> ReplicationSlotReserveWal() and then similarly invoke
> LogStandbySnapshot() to ensure that we have enough information to
> start.

I did not handle these parts because they needed more analysis. Let's discuss
in later versions.

> 
> Few minor comments:
> ==================
> 1. The commit message and other comments like atop
> get_old_cluster_logical_slot_infos() needs to be adjusted as per
> recent changes.

I revisited comments and updated.

> 2.
> @@ -1268,7 +1346,11 @@ stream_start_cb_wrapper(ReorderBuffer *cache,
> ReorderBufferTXN *txn,
>   LogicalErrorCallbackState state;
>   ErrorContextCallback errcallback;
> 
> - Assert(!ctx->fast_forward);
> + /*
> + * In silent mode all the two-phase callbacks are not set so that the
> + * wrapper should not be called.
> + */
> + Assert(ctx->decoding_mode == DECODING_MODE_NORMAL);
> 
> This and other similar comments doesn't seems to be consistent as the
> function name and comments are not matching.

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v47-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

10 октября 2023 г., 14:22:27

Dear Vignesh,

Thanks for reviewing! You can available new version in [1].

> 
> Few comments:
> 1)  Should we add binary upgrade check "CHECK_IS_BINARY_UPGRADE" for
> this funcion too:
> +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
> +{
> +       Name            name = PG_GETARG_NAME(0);
> +       Name            plugin = PG_GETARG_NAME(1);
> +
> +       /* Temporary slots is never handled in this function */
> +       bool            two_phase = PG_GETARG_BOOL(2);

Yeah, needed. For testing purpose I did not add, but it should have.
Added.

> 2) Generally we are specifying the slot name in this case, is slot
> name null check required:
> +Datum
> +binary_upgrade_validate_wal_logical_end(PG_FUNCTION_ARGS)
> +{
> +       Name            slot_name;
> +       XLogRecPtr      end_of_wal;
> +       LogicalDecodingContext *ctx = NULL;
> +       bool            has_record;
> +
> +       CHECK_IS_BINARY_UPGRADE;
> +
> +       /* Quick exit if the input is NULL */
> +       if (PG_ARGISNULL(0))
> +               PG_RETURN_BOOL(false);


NULL check was added. I felt that we should raise an ERROR. 

> 3) Since this is similar to pg_create_logical_replication_slot, can we
> add a comment saying any change in pg_create_logical_replication_slot
> would also need the same check to be added in
> binary_upgrade_create_logical_replication_slot:
> +/*
> + * SQL function for creating a new logical replication slot.
> + *
> + * This function is almost same as pg_create_logical_replication_slot(), but
> + * this can specify the restart_lsn.
> + */
> +Datum
> +binary_upgrade_create_logical_replication_slot(PG_FUNCTION_ARGS)
> +{
> +       Name            name = PG_GETARG_NAME(0);
> +       Name            plugin = PG_GETARG_NAME(1);
> +
> +       /* Temporary slots is never handled in this function */

Added.

> 4) Any conclusion on this try catch comment, do you want to add which
> setting you want to revert in catch, if try/catch is not required we
> can remove this comment:
> +       ReplicationSlotAcquire(NameStr(*slot_name), true);
> +
> +       /* XXX: Is PG_TRY/CATCH needed around here? */
> +
> +       /*
> +        * We use silent mode here to decode all changes without
> outputting them,
> +        * allowing us to detect all the records that could be sent downstream.
> +        */

After considering more, it's OK to raise an ERROR because caller can detect it.
Also, there are any setting to be reverted. The comment is removed.

> 5) I felt these 2 comments can be combined as both are trying to say
> the same thing:
> + * This is a special purpose function to ensure that there are no WAL records
> + * pending to be decoded after the given LSN.
> + *
> + * It is used to ensure that there is no pending WAL to be consumed for
> + * the logical slots.

Later part was removed.

> 6) I feel this memset is not required as we are initializing at the
> beginning of function, if you want to keep the memset, the
> initialization can be removed:
> +       values[2] = CStringGetTextDatum(xlogfilename);
> +
> +       memset(nulls, 0, sizeof(nulls));
> +
> +       tuple = heap_form_tuple(tupdesc, values, nulls);

The initialization was removed to follow pg_create_logical_replication_slot.

> 7) looks like a typo, "mu" should be "must":
> +       /*
> +        * Also, we mu execute pg_log_standby_snapshot() when logical
> replication
> +        * slots are migrated. Because RUNNING_XACTS record is
> required to create
> +        * a consistent snapshot.
> +        */
> +       if (count_old_cluster_logical_slots())
> +               create_consistent_snapshot();

Fixed.

> 8) consitent should be consistent:
> +/*
> + * Log the details of the current snapshot to the WAL, allowing the snapshot
> + * state to be reconstructed for logical decoding on the upgraded slots.
> + */
> +static void
> +create_consistent_snapshot(void)
> +{
> +       DbInfo     *old_db = &old_cluster.dbarr.dbs[0];
> +       PGconn     *conn;
> +
> +       prep_status("Creating a consitent snapshot on new cluster");

Fixed.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

10 октября 2023 г., 14:23:04

Dear Bharath,

Thanks for giving comments and apologize for late reply.
New version is available in [1].

> +1 for this approach. It looks neat.
> 
> I think we also need to add TAP tests to generate decodable WAL
> records (RUNNING_XACT, CHECKPOINT_ONLINE, XLOG_FPI_FOR_HINT,
> XLOG_SWITCH, XLOG_PARAMETER_CHANGE, XLOG_HEAP2_PRUNE) during
> pg_upgrade as described here
> https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256
> B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com.
> Basically, these were the exceptional WAL records that may be
> generated by pg_upgrade, so having tests for them is good.

Hmm, I'm not sure it is really good. If we add such a test, we may have to add
further tests in future if new WAL log types during upgrade is introduced.
Currently we do not have if-statement for each WAL types, so it does not improve
coverage, I thought. Another concern is that I'm not sure how do we simply and
surely generate XLOG_HEAP2_PRUNE.

Based on above, I did not add the test case for now.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866068CB6591C8AE1F9690BF5CDA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

10 октября 2023 г., 15:47:39

On Tue, Oct 10, 2023 at 4:51 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> >
> > Isn't it sufficient to add a test for silent mode in
> > begin/stream_start/begin_prepare kind of APIs and set
> > ctx->did_process? In all other APIs, we can assert that did_process
> > shouldn't be set and we never reach there when decoding mode is
> > silent.
> >
> >
> > + /* Check whether the meaningful change was found */
> > + found = (ctx->reorder->by_txn_last_xid != InvalidTransactionId ||
> > + ctx->did_process);
> >
> > Are you talking about this check in the patch? If so, can you please
> > explain when does the first check help?
>
> I changed around here so I describe once again.
>
> A flag (output_skipped) is set when the transaction is decoded till the end in
> silent mode. It is done in DecodeTXNNeedSkip() because the function is the common
> path for both committed/aborted transactions. Also, DecodeTXNNeedSkip() returns
> true when the decoding context is in the silent mode. Therefore, any cb_wrapper
> functions would not be called anymore. DecodingContextHasdecodedItems() just
> returns output_skipped.
>
> This approach needs to read WALs till end of transactions before returning the
> upgrading function, but codes look simpler than the previous version.
>

 DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
    Oid txn_dbid, RepOriginId origin_id)
 {
- return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
- (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
- ctx->fast_forward || FilterByOrigin(ctx, origin_id));
+ bool need_skip;
+
+ need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
+ (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
+ ctx->decoding_mode != DECODING_MODE_NORMAL ||
+ FilterByOrigin(ctx, origin_id));
+
+ /* Set a flag if we are in the slient mode */
+ if (ctx->decoding_mode == DECODING_MODE_SILENT)
+ ctx->output_skipped = true;
+
+ return need_skip;

I think you need to set the new flag only when we are not skipping the
transaction or in other words when we decide to process the
transaction. Otherwise, how will you distinguish the case where the
xact is already decoded and sent to client?

--
With Regards,
Amit Kapila

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

11 октября 2023 г., 10:43:08

On Tue, Oct 10, 2023 at 6:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>  DecodeTXNNeedSkip(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
>     Oid txn_dbid, RepOriginId origin_id)
>  {
> - return (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
> - (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
> - ctx->fast_forward || FilterByOrigin(ctx, origin_id));
> + bool need_skip;
> +
> + need_skip = (SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) ||
> + (txn_dbid != InvalidOid && txn_dbid != ctx->slot->data.database) ||
> + ctx->decoding_mode != DECODING_MODE_NORMAL ||
> + FilterByOrigin(ctx, origin_id));
> +
> + /* Set a flag if we are in the slient mode */
> + if (ctx->decoding_mode == DECODING_MODE_SILENT)
> + ctx->output_skipped = true;
> +
> + return need_skip;
>
> I think you need to set the new flag only when we are not skipping the
> transaction or in other words when we decide to process the
> transaction. Otherwise, how will you distinguish the case where the
> xact is already decoded and sent to client?
>

In the attached patch atop your v47*, I have changed it to show you
what I have in mind.

A few more comments:
=================
1.
+
+ /*
+ * Did the logical decoding context skip outputting any changes?
+ *
+ * This flag is used only when the context is in the silent mode.
+ */
+ bool output_skipped;
 } LogicalDecodingContext;

This doesn't seem to convey the meaning to the caller. How about
processing_required? BTW, I have made this change as well in the
patch.

2.
@@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
*/
if (TransactionIdIsValid(xid))
{
- if (!ctx->fast_forward)
+ if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferAddInvalidations(reorder, xid,
  buf->origptr,
  invals->nmsgs,
@@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf)
ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
  buf->origptr);
}
- else if ((!ctx->fast_forward))
+ else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
ReorderBufferImmediateInvalidation(ctx->reorder,
   invals->nmsgs,
   invals->msgs);

We don't to execute the invalidations even in silent mode. Looking at
this and other changes in the patch related to silent mode, I wonder
whether we really need to introduce 'silent_mode'. Can't we simply set
processing_required when 'fast_forward' mode is true and then let the
caller decide whether it needs to further process the WAL?

--
With Regards,
Amit Kapila.

Вложения

v47_changes_amit_1.patch.txt

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

11 октября 2023 г., 13:57:45

Dear Amit,

Thank you for reviewing! PSA new version.

> > I think you need to set the new flag only when we are not skipping the
> > transaction or in other words when we decide to process the
> > transaction. Otherwise, how will you distinguish the case where the
> > xact is already decoded and sent to client?

Actually, I wondered what should be, but I followed it. Indeed, we should avoid
the case which the xact has already been sent. But I was not sure other conditions
like transactions for another database - IIUC previous version regarded it as not
acceptable.

Now, I reconsider these cases can be ignored because they would not be sent to
subscriber. The consistency between pub/sub would not be broken even if these
WALs are remained.

> In the attached patch atop your v47*, I have changed it to show you
> what I have in mind.

Thanks, was included.

> A few more comments:
> =================
> 1.
> +
> + /*
> + * Did the logical decoding context skip outputting any changes?
> + *
> + * This flag is used only when the context is in the silent mode.
> + */
> + bool output_skipped;
>  } LogicalDecodingContext;
> 
> This doesn't seem to convey the meaning to the caller. How about
> processing_required? BTW, I have made this change as well in the
> patch.

LGTM, changed like that.

> 2.
> @@ -295,7 +295,7 @@ xact_decode(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf)
> */
> if (TransactionIdIsValid(xid))
> {
> - if (!ctx->fast_forward)
> + if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
> ReorderBufferAddInvalidations(reorder, xid,
>   buf->origptr,
>   invals->nmsgs,
> @@ -303,7 +303,7 @@ xact_decode(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf)
> ReorderBufferXidSetCatalogChanges(ctx->reorder, xid,
>   buf->origptr);
> }
> - else if ((!ctx->fast_forward))
> + else if (ctx->decoding_mode != DECODING_MODE_FAST_FORWARD)
> ReorderBufferImmediateInvalidation(ctx->reorder,
>    invals->nmsgs,
>    invals->msgs);
> 
> We don't to execute the invalidations even in silent mode. Looking at
> this and other changes in the patch related to silent mode, I wonder
> whether we really need to introduce 'silent_mode'. Can't we simply set
> processing_required when 'fast_forward' mode is true and then let the
> caller decide whether it needs to further process the WAL?

After considering again, I agreed to remove silent mode. Initially, it was
introduced because did_process flag is set at XXX_cb_wrapper and reorderbuffer
layer. Now, the processing_required is set in DecodeCommit()->DecodeTXNNeedSkip(),
which means that each records does not need to be decoded. Based on that,
I removed the silent mode and use fast-forwarding mode instead.

Also, some parts (mostly code comments) were modified.

Acknowledgement: Thanks Peter and Hou for discussing with me.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v48-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

12 октября 2023 г., 12:28:48

On Wed, Oct 11, 2023 at 4:27 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thank you for reviewing! PSA new version.
>

Some more comments:
1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
First, let's change its name to binary_upgrade_slot_has_pending_wal()
or something like that. Then move the context creation and free
related code into DecodingContextHasDecodedItems(). We can rename
DecodingContextHasDecodedItems() as
pg_logical_replication_slot_has_pending_wal() and place it in
slotfuncs.c. This will make the code structure similar to other slot
functions like pg_replication_slot_advance().

2. + * Returns true if there are no changes after the confirmed_flush_lsn.

How about something like: "Returns true if there are no decodable WAL
records after the confirmed_flush_lsn."?

3. Shouldn't we need to call CheckSlotPermissions() in
binary_upgrade_validate_wal_logical_end?

4.
+ /*
+ * Also, set processing_required flag if the message is not
+ * transactional. It is needed to notify the message's existence to
+ * the caller side. Usually, the flag is set when either the COMMIT or
+ * ABORT records are decoded, but this must be turned on here because
+ * the non-transactional logical message is decoded without waiting
+ * for these records.
+ */

The first sentence of the comments doesn't seem to be required as that
just says what the code does. So, let's slightly change it to: "We
need to set processing_required flag to notify the message's existence
to the caller side. Usually, the flag is set when either the COMMIT or
ABORT records are decoded, but this must be turned on here because the
non-transactional logical message is decoded without waiting for these
records."

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

12 октября 2023 г., 14:41:01

Dear Amit,

Thanks for your suggestion! PSA new version.

> The other problem is that pg_resetwal removes all pre-existing WAL
> files which in this case could lead to the removal of the WAL file
> corresponding to restart_lsn. This is because at least the shutdown
> checkpoint record will be written after the creation of slots which
> could be in the new file used for restart_lsn. Then when we invoke
> pg_resetwal, it can remove that file.
> 
> One idea to deal with this could be to do the reset WAL stuff
> (FindEndOfXLOG(), KillExistingXLOG(), KillExistingArchiveStatus(),
> WriteEmptyXLOG()) in a separate function (say in pg_upgrade) and then
> create slots. If we do this, then we additionally need an option in
> pg_resetwal which skips resetting the WAL as that would have been done
> before creating the slots.

Based on above idea, I made new version patch which some functionalities were
exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
then create logical slots, then pg_resetwal would be called with new option
--no-switch, which avoid to switch a WAL segment file. The option is only used
for the upgrading purpose so it is not written in doc and usage(). This option
is not required if pg_resetwal -o does not discard WAL records. Please see the
fork thread [1].

We do not have to reserve future restart_lsn while creating a slot, so the binary
function binary_upgrade_create_logical_replication_slot() was removed.

Another advantage of this approach is to avoid calling pg_log_standby_snapshot()
after the pg_resetwal. This was needed because of two reasons, but they were
resolved automatically.
  1) pg_resetwal removes all WAL files.
  2) Logical slots requires a RUNNING_XACTS record for building a snapshot.
 
[1]: https://www.postgresql.org/message-id/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v49-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

12 октября 2023 г., 14:42:10

Dear Amit,

Thanks for reviewing! New patch is available at [1].

> 
> Some more comments:
> 1. Let's restruture binary_upgrade_validate_wal_logical_end() a bit.
> First, let's change its name to binary_upgrade_slot_has_pending_wal()
> or something like that. Then move the context creation and free
> related code into DecodingContextHasDecodedItems(). We can rename
> DecodingContextHasDecodedItems() as
> pg_logical_replication_slot_has_pending_wal() and place it in
> slotfuncs.c. This will make the code structure similar to other slot
> functions like pg_replication_slot_advance().

Seems clearer than mine. Fixed.

> 2. + * Returns true if there are no changes after the confirmed_flush_lsn.
> 
> How about something like: "Returns true if there are no decodable WAL
> records after the confirmed_flush_lsn."?

Fixed.

> 3. Shouldn't we need to call CheckSlotPermissions() in
> binary_upgrade_validate_wal_logical_end?

Added, but actually it is not needed. This is because only superusers can connect
to the server while upgrading. Please see below codes in InitPostgres().

```
    if (IsBinaryUpgrade && !am_superuser)
    {
        ereport(FATAL,
                (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
                 errmsg("must be superuser to connect in binary upgrade mode")));
    }
```

> 4.
> + /*
> + * Also, set processing_required flag if the message is not
> + * transactional. It is needed to notify the message's existence to
> + * the caller side. Usually, the flag is set when either the COMMIT or
> + * ABORT records are decoded, but this must be turned on here because
> + * the non-transactional logical message is decoded without waiting
> + * for these records.
> + */
> 
> The first sentence of the comments doesn't seem to be required as that
> just says what the code does. So, let's slightly change it to: "We
> need to set processing_required flag to notify the message's existence
> to the caller side. Usually, the flag is set when either the COMMIT or
> ABORT records are decoded, but this must be turned on here because the
> non-transactional logical message is decoded without waiting for these
> records."

Fixed.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866B0614F80CE9F5EF051BDF5D3A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

14 октября 2023 г., 08:15:27

Dear hackers,

Here is a new patch.

Previously I wrote:
> Based on above idea, I made new version patch which some functionalities were
> exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
> then create logical slots, then pg_resetwal would be called with new option
> --no-switch, which avoid to switch a WAL segment file. The option is only used
> for the upgrading purpose so it is not written in doc and usage(). This option
> is not required if pg_resetwal -o does not discard WAL records. Please see the
> fork thread [1].

But for now, these changes were reverted because changing pg_resetwal -o stuff
may be a bit risky. This has been located more than ten years so that we should
be more careful for modifying.
Also, I cannot come up with problems if slots are created after the pg_resetwal.
Background processes would not generate decodable changes (listed in [1]), and
BGworkers by extensions could be ignored [2].
Based on the discussion on forked thread [3] and if it is accepted, we will apply
again.

Also. some comments and function name was improved.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB58660273EACEFC5BF256B133F50DA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/CAA4eK1L4JB%2BKH_4EQryDEhyaLBPW6V20LqjdzOxCWyL7rbxqsA%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/flat/CAA4eK1KRyPMiY4fW98qFofsYrPd87Oc83zDNxSeHfTYh_asdBg%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v50-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

16 октября 2023 г., 12:13:57

On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Here is a new patch.
>
> Previously I wrote:
> > Based on above idea, I made new version patch which some functionalities were
> > exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
> > then create logical slots, then pg_resetwal would be called with new option
> > --no-switch, which avoid to switch a WAL segment file. The option is only used
> > for the upgrading purpose so it is not written in doc and usage(). This option
> > is not required if pg_resetwal -o does not discard WAL records. Please see the
> > fork thread [1].
>
> But for now, these changes were reverted because changing pg_resetwal -o stuff
> may be a bit risky. This has been located more than ten years so that we should
> be more careful for modifying.
> Also, I cannot come up with problems if slots are created after the pg_resetwal.
> Background processes would not generate decodable changes (listed in [1]), and
> BGworkers by extensions could be ignored [2].
> Based on the discussion on forked thread [3] and if it is accepted, we will apply
> again.
>

Yeah, I think introducing additional complexity unless it is really
required sounds a bit scary to me as well. BTW, please find attached
some cosmetic changes.

One minor additional comment:
+# Initialize subscriber cluster
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

Why do we need to set wal_level as logical for subscribers?

--
With Regards,
Amit Kapila.

Вложения

v50_changes_amit_1.patch.txt

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

16 октября 2023 г., 17:58:28

On Mon, 16 Oct 2023 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 14, 2023 at 10:45 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Here is a new patch.
> >
> > Previously I wrote:
> > > Based on above idea, I made new version patch which some functionalities were
> > > exported from pg_resetwal. In this approach, pg_upgrade itself removed WALs and
> > > then create logical slots, then pg_resetwal would be called with new option
> > > --no-switch, which avoid to switch a WAL segment file. The option is only used
> > > for the upgrading purpose so it is not written in doc and usage(). This option
> > > is not required if pg_resetwal -o does not discard WAL records. Please see the
> > > fork thread [1].
> >
> > But for now, these changes were reverted because changing pg_resetwal -o stuff
> > may be a bit risky. This has been located more than ten years so that we should
> > be more careful for modifying.
> > Also, I cannot come up with problems if slots are created after the pg_resetwal.
> > Background processes would not generate decodable changes (listed in [1]), and
> > BGworkers by extensions could be ignored [2].
> > Based on the discussion on forked thread [3] and if it is accepted, we will apply
> > again.
> >

1) Should this:
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Tests for upgrading replication slots
+
be:
"Tests for upgrading logical replication slots"

2)  This statement is not entirely true:
+     <listitem>
+      <para>
+       The old cluster has replicated all the changes to subscribers.
+      </para>

If we have some changes like shutdown_checkpoint the upgrade passes,
if we have some changes like create view whose changes will not be
replicated the upgrade fails.

3) All these includes are not required except for "logical.h"
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -11,14 +11,20 @@

 #include "postgres.h"

+#include "access/xlogutils.h"
+#include "access/xlog_internal.h"
 #include "catalog/binary_upgrade.h"
 #include "catalog/heap.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
 #include "commands/extension.h"
+#include "funcapi.h"
 #include "miscadmin.h"
+#include "replication/logical.h"
+#include "replication/slot.h"
 #include "utils/array.h"
 #include "utils/builtins.h"
+#include "utils/pg_lsn.h"

4) We could print two_phase as true/false instead of 0/1:
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+       /* Quick return if there are no logical slots. */
+       if (slot_arr->nslots == 0)
+               return;
+
+       pg_log(PG_VERBOSE, "Logical replication slots within the database:");
+
+       for (int slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+       {
+               LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+
+               pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
two_phase: %d",
+                          slot_info->slotname,
+                          slot_info->plugin,
+                          slot_info->two_phase);
+       }
+}

5) test passes without the below, maybe this is not required:
+# 2. Consume WAL records to avoid another type of upgrade failure. It will be
+#       tested in subsequent cases.
+$old_publisher->safe_psql('postgres',
+       "SELECT count(*) FROM
pg_logical_slot_get_changes('test_slot1', NULL, NULL);"
+);

6) This message "run of pg_upgrade of old cluster with idle
replication slots" seems wrong:
+# pg_upgrade will fail because the slot still has unconsumed WAL records
+command_checks_all(
+       [
+               'pg_upgrade', '--no-sync',
+               '-d', $old_publisher->data_dir,
+               '-D', $new_publisher->data_dir,
+               '-b', $bindir,
+               '-B', $bindir,
+               '-s', $new_publisher->host,
+               '-p', $old_publisher->port,
+               '-P', $new_publisher->port,
+               $mode,
+       ],
+       1,
+       [
+               qr/Your installation contains invalid logical
replication slots./
+       ],
+       [qr//],
+       'run of pg_upgrade of old cluster with idle replication slots');
+ok( -d $new_publisher->data_dir . "/pg_upgrade_output.d",
+       "pg_upgrade_output.d/ not removed after pg_upgrade failure");

7) You could run pgindent and pgperlytidy, it shows there are few
issues present with the patch.

Regards,
Vignesh

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

17 октября 2023 г., 15:15:04

Dear Amit,

Thanks for reviewing! PSA new version.

> 
> Yeah, I think introducing additional complexity unless it is really
> required sounds a bit scary to me as well. BTW, please find attached
> some cosmetic changes.

Basically LGTM, but below part was conflicted with Bharath's comment [1].

```
@@ -1607,7 +1605,7 @@ check_old_cluster_for_valid_slots(bool live_check)
         fclose(script);
 
         pg_log(PG_REPORT, "fatal");
-        pg_fatal("Your installation contains logical replication slots that cannot be upgraded.\n"
+        pg_fatal("Your installation contains invalid logical replication slots.\n"
```

How about " Your installation contains logical replication slots that can't be upgraded."?

> One minor additional comment:
> +# Initialize subscriber cluster
> +my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
> +$subscriber->init(allows_streaming => 'logical');
> 
> Why do we need to set wal_level as logical for subscribers?

It is not mandatory. The line was copied from tests in src/test/subscription.
Removed the setting from my patch. I felt that it could be removed from other
patches. I will fork new thread and post the patch.


Also, I did some improvements based on the v50, basically for tests.

1. Test file was refactored. pg_uprade was executed many times in the test so the
   test time was increasing. Below refactorings were done.

===
a. Checks for both transactional and non-transactional changes were done at the
   same time.
b. Removed the dry-run test. It did not improve the coverage.
c. Removed the wal_level test. Other tests like subscriptions and test_decoding
   do not contain test for GUCs, so I thought it could be acceptable. Removing
   all the GUC test (for max_replication_slots) might be risky, so it was remained.
===

2. Supported the cross-version checks. If an environment variable "oldinstall"
   is set, use the binary as old cluster. If the specified one is PG16-, the
   test verifies that logical replication slots would not be migrated.
   002_pg_upgrade.pl requires that $ENV(olddump) must be also defined, but it
   is not needed for our test. I tried to support from PG9.2, which is the oldest
   version for Xupgrade test [2]. You can see 0002 patch for it.
   IIUC pg_create_logical_replication_slot() can be available since PG9.4, so tests
   will be skipped if older executables are specified, like:

```
$ oldinstall=/home/hayato/older/pg92/ make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/003_upgrade_logical_replication_slots.pl .. skipped: Logical replication slots can be available since PG9.4
Files=1, Tests=0,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.08 cusr  0.02 csys =  0.13 CPU)
Result: NOTESTS
```

[1]: https://www.postgresql.org/message-id/CALj2ACXp%2BLXioY_%3D9mboEbLD--4c4nnpJCZ%2Bj4fckBdSOQhENA%40mail.gmail.com
[2]:
https://github.com/PGBuildFarm/client-code/releases#:~:text=support%20for%20testing%20cross%20version%20upgrade%20extended%20back%20to%209.2

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear Peter,

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.

> Here are some review comments for v51-0001
> 
> ======
> src/bin/pg_upgrade/check.c
> 
> 0.
> +check_old_cluster_for_valid_slots(bool live_check)
> +{
> + char output_path[MAXPGPATH];
> + FILE    *script = NULL;
> +
> + prep_status("Checking for valid logical replication slots");
> +
> + snprintf(output_path, sizeof(output_path), "%s/%s",
> + log_opts.basedir,
> + "invalid_logical_relication_slots.txt");
> 
> 0a
> typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

Fixed.

> 0b.
> Since the non-upgradable slots are not strictly "invalid", is this an
> appropriate filename for the bad ones?
> 
> But I don't have very good alternatives. Maybe:
> - non_upgradable_logical_replication_slots.txt
> - problem_logical_replication_slots.txt

Per discussion [1], I kept current style.

> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> +# ------------------------------
> +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> +#
> +# There are two requirements for GUCs - wal_level and max_replication_slots,
> +# but only max_replication_slots will be tested here. This is because to
> +# reduce the execution time of the test.
> 
> 
> SUGGESTION
> # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> #
> # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> # reduce the test execution time, only 'max_replication_slots' is tested here.

First part was fixed. Second part was removed per [1].

> 2.
> +# Preparations for the subsequent test:
> +# 1. Create two slots on the old cluster
> +$old_publisher->start;
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_create_logical_replication_slot('test_slot1',
> 'test_decoding', false, true);"
> +);
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);"
> +);
> 
> 
> Can't you combine those SQL in the same $old_publisher->safe_psql.

Combined.

> 3.
> +# Clean up
> +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> +# Set max_replication_slots to the same value as the number of slots. Both of
> +# slots will be used for subsequent tests.
> +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> 
> The code doesn't seem to match the comment - is this correct? The
> old_publisher created 2 slots, so why are you setting new_publisher
> "max_replication_slots = 1" again?

Fixed to "max_replication_slots = 2" Note that previous test worked well because
GUC checking on new cluster is done after checking the status of slots.

> 4.
> +# Preparations for the subsequent test:
> +# 1. Generate extra WAL records. Because these WAL records do not get
> consumed
> +# it will cause the upcoming pg_upgrade test to fail.
> +$old_publisher->start;
> +$old_publisher->safe_psql('postgres',
> + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> +
> +# 2. Advance the slot test_slot2 up to the current WAL location
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> +
> +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> +# this slot will be also reported by upcoming pg_upgrade.
> +$old_publisher->safe_psql('postgres',
> + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');"
> +);
> 
> 
> I felt this test would be clearer if you emphasised the state of the
> test_slot1 also. e.g.
> 
> 4a.
> BEFORE
> +# 1. Generate extra WAL records. Because these WAL records do not get
> consumed
> +# it will cause the upcoming pg_upgrade test to fail.
> 
> SUGGESTION
> # 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
> #    has consumed them.

Fixed.

> 4b.
> BEFORE
> +# 2. Advance the slot test_slot2 up to the current WAL location
> 
> SUGGESTION
> # 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
> #    still has unconsumed WAL records.

IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think 
"but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.

> 5.
> +# pg_upgrade will fail because the slot still has unconsumed WAL records
> +command_checks_all(
> 
> /because the slot still has/because there are slots still having/

Fixed.

> 6.
> + [qr//],
> + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> +);
> 
> /slot/slots/

Fixed.

> 7.
> +# And check the content. Both of slots must be reported that they have
> +# unconsumed WALs after confirmed_flush_lsn.
> 
> SUGGESTION
> # Check the file content. Both slots should be reporting that they have
> # unconsumed WAL records.

Fixed.

> 
> 8.
> +# Preparations for the subsequent test:
> +# 1. Setup logical replication
> +my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
> +
> +$old_publisher->start;
> +
> +$old_publisher->safe_psql('postgres',
> + "SELECT * FROM pg_drop_replication_slot('test_slot1');");
> +$old_publisher->safe_psql('postgres',
> + "SELECT * FROM pg_drop_replication_slot('test_slot2');");
> +
> +$old_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
> 
> 
> 8a.
> /Setup logical replication/Setup logical replication (first, cleanup
> slots from the previous tests)/

Fixed.

> 8b.
> Can't you combine all those SQL in the same $old_publisher->safe_psql.

Combined.

> 9.
> +
> +# Actual run, successful upgrade is expected
> +command_ok(
> + [
> + 'pg_upgrade', '--no-sync',
> + '-d', $old_publisher->data_dir,
> + '-D', $new_publisher->data_dir,
> + '-b', $bindir,
> + '-B', $bindir,
> + '-s', $new_publisher->host,
> + '-p', $old_publisher->port,
> + '-P', $new_publisher->port,
> + $mode,
> + ],
> + 'run of pg_upgrade of old cluster');
> 
> Now that the "Dry run" part is removed, it seems unnecessary to say
> "Actual run" for this part.
> 
> 
> SUGGESTION
> # pg_upgrade should be successful.

Fixed.

[1]:
https://www.postgresql.org/message-id/CAA4eK1%2BAHSWPs2_jn%3DftJKRqz-NXU6o%3DrPQ3f%3DH-gcPsgpPFrw%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v52-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

18 октября 2023 г., 12:27:09

Dear Peter,

Thank you for reviewing! New patch is available in [1].

> ======
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> +# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
> +# Such clusters regard max_wal_senders as part of max_connections, but the
> +# current TAP tester sets these GUCs to the same value.
> +if ($old_publisher->pg_version < 12)
> +{
> + $old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
> +}
> 
> 1a.
> I was initially unsure what the above comment meant -- thanks for the
> offline explanation.
> 
> SUGGESTION
> The TAP Cluster.pm assigns default 'max_wal_senders' and
> 'max_connections' to the same value (10) but PG12 and prior considered
> max_walsenders as a subset of max_connections, so setting the same
> value will fail.

Fixed.

> 1b.
> I also felt it is better to explicitly set both values in the < PG12
> configuration because otherwise, you are still assuming knowledge that
> the TAP default max_connections is 10.
> 
> SUGGESTION
> $old_publisher->append_conf('postgresql.conf', qq{
> max_wal_senders = 5
> max_connections = 10
> });

Fixed.

> 2.
> +# Switch workloads depend on the major version of the old cluster.  Upgrading
> +# logical replication slots has been supported since PG17.
> +if ($old_publisher->pg_version <= 16)
> +{
> + test_for_16_and_prior($old_publisher, $new_publisher, $mode);
> +}
> +else
> +{
> + test_for_17_and_later($old_publisher, $new_publisher, $mode);
> +}
> 
> IMO it is less confusing to have fewer version numbers floating around
> in comments and names and code. So instead of referring to 16 and 17,
> how about just referring to 17 everywhere?
> 
> For example
> 
> SUGGESTION
> # Test according to the major version of the old cluster.
> # Upgrading logical replication slots has been supported only since PG17.
> 
> if ($old_publisher->pg_version >= 17)
> {
>   test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
> }
> else
> {
>   test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
> }

In HEAD code, the pg_version seems "17devel". The string seemed smaller than 17 for Perl.
(i.e., "17devel" >= 17 means false)
For the purpose of comparing only the major version, pg_version->major was used.

Also, I removed the support for ~PG9.4. I cannot find descriptions, but according to [2],
Cluster.pm does not support such binaries.
(cluster_name is set when the server process is started, but the GUC has been added in PG9.5)

[1]:
https://www.postgresql.org/message-id/TYCPR01MB5870EBEBC89F5224F6B3788CF5D5A%40TYCPR01MB5870.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/YsUrUDrRhUbuU/6k%40paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

19 октября 2023 г., 04:29:56

Here are some review comments for v52-0001

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+ # 2. max_replication_slots is set to smaller than the number of slots (2)
+ # present on the old cluster

SUGGESTION
2. Set 'max_replication_slots' to be less than the number of slots (2)
present on the old cluster.

~~~

2.
+ # Set max_replication_slots to the same value as the number of slots. Both
+ # of slots will be used for subsequent tests.

SUGGESTION
Set 'max_replication_slots' to match the number of slots (2) present
on the old cluster.
Both slots will be used for subsequent tests.

~~~

3.
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );

SUGGESTION
3. Emit a non-transactional message. This will cause test_slot2 to
detect the unconsumed WAL record.

~~~

4.
+ # Preparations for the subsequent test:
+ # 1. Generate extra WAL records. At this point neither test_slot1 nor
+ # test_slot2 has consumed them.
+ $old_publisher->start;
+ $old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+ # 2. Advance the slot test_slot2 up to the current WAL location, but
+ # test_slot1 still has unconsumed WAL records.
+ $old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );
+
+ $old_publisher->stop;

All of the above are sequentially executed on the
old_publisher->safe_psql, so consider if it is worth combining them
all in a single call (keeping the comments 1,2,3 separate still)

For example,

$old_publisher->start;
$old_publisher->safe_psql('postgres', qq[
  CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
  SELECT pg_replication_slot_advance('test_slot2', NULL);
  SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
]);
$old_publisher->stop;

~~~

5.
+ # Clean up
+ $subscriber->stop();
+ $new_publisher->stop();

Should this also drop the 'test_slot1' and 'test_slot2'?

~~~

6.
+# Verify that logical replication slots cannot be migrated.  This function
+# will be executed when the old cluster is PG16 and prior.
+sub test_upgrade_from_pre_PG17
+{
+ my ($old_publisher, $new_publisher, $mode) = @_;
+
+ my $oldbindir = $old_publisher->config_data('--bindir');
+ my $newbindir = $new_publisher->config_data('--bindir');

SUGGESTION (let's not mention lots of different numbers; just refer to 17)
This function will be executed when the old cluster version is prior to PG17.

~~

7.
+ # Actual run, successful upgrade is expected
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $oldbindir,
+ '-B', $newbindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
+
+ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ removed after pg_upgrade success");

7a.
The comment is wrong?

SUGGESTION
# pg_upgrade should NOT be successful

~

7b.
There is a blank line here before the ok() function, but in the other
tests, there was none. Better to be consistent.

~~~

8.
+ # Clean up
+ $new_publisher->stop();

Should this also drop the 'test_slot'?

~~~

9.
+# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
+# the same value (10) but PG12 and prior considered max_walsenders as a subset
+# of max_connections, so setting the same value will fail.
+if ($old_publisher->pg_version->major < 12)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

If the comment is correct, then PG12 *and* prior, should be testing
"<= 12", not "< 12". right?

~~~

10.
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots has been supported only since PG17.
+if ($old_publisher->pg_version->major >= 17)

This comment seems wrong IMO. I think we always running the latest
version of pg_upgrade so slot migration is always "supported" from now
on. IIUC you intended this comment to be saying something about the
old_publisher slots.

BEFORE
Upgrading logical replication slots has been supported only since PG17.

SUGGESTION
Upgrading logical replication slots from versions older than PG17 is
not supported.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

19 октября 2023 г., 06:02:00

On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ======
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE    *script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# ------------------------------
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
> > #    has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
> > #    still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.
> > +# And check the content. Both of slots must be reported that they have
> > +# unconsumed WALs after confirmed_flush_lsn.
> >
> > SUGGESTION
> > # Check the file content. Both slots should be reporting that they have
> > # unconsumed WAL records.
>
> Fixed.
>
> >
> > 8.
> > +# Preparations for the subsequent test:
> > +# 1. Setup logical replication
> > +my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
> > +
> > +$old_publisher->start;
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot1');");
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot2');");
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
> >
> >
> > 8a.
> > /Setup logical replication/Setup logical replication (first, cleanup
> > slots from the previous tests)/
>
> Fixed.
>
> > 8b.
> > Can't you combine all those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 9.
> > +
> > +# Actual run, successful upgrade is expected
> > +command_ok(
> > + [
> > + 'pg_upgrade', '--no-sync',
> > + '-d', $old_publisher->data_dir,
> > + '-D', $new_publisher->data_dir,
> > + '-b', $bindir,
> > + '-B', $bindir,
> > + '-s', $new_publisher->host,
> > + '-p', $old_publisher->port,
> > + '-P', $new_publisher->port,
> > + $mode,
> > + ],
> > + 'run of pg_upgrade of old cluster');
> >
> > Now that the "Dry run" part is removed, it seems unnecessary to say
> > "Actual run" for this part.
> >
> >
> > SUGGESTION
> > # pg_upgrade should be successful.
>
> Fixed.

Few comments:
1) We will be able to override the value of max_slot_wal_keep_size by
using --new-options like '--new-options  "-c
max_slot_wal_keep_size=val"':
+       /*
+        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
+        * checkpointer process.  If WALs required by logical replication slots
+        * are removed, the slots are unusable.  This setting prevents the
+        * invalidation of slots during the upgrade. We set this option when
+        * cluster is PG17 or later because logical replication slots
can only be
+        * migrated since then. Besides, max_slot_wal_keep_size is
added in PG13.
+        */
+       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
+               appendPQExpBufferStr(&pgoptions, " -c
max_slot_wal_keep_size=-1");

Should there be a check to throw an error if this option is specified
or do we need some documentation that this option should not be
specified?

2) Because we are able to override max_slot_wal_keep_size there is a
chance of slot getting invalidated and Assert being hit:
+               /*
+                * The logical replication slots shouldn't be invalidated as
+                * max_slot_wal_keep_size GUC is set to -1 during the upgrade.
+                *
+                * The following is just a sanity check.
+                */
+               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+               {
+                       Assert(max_slot_wal_keep_size_mb == -1);
+                       elog(ERROR, "replication slots must not be
invalidated during the upgrade");
+               }

3) File 003_logical_replication_slots.pl is now changed to
003_upgrade_logical_replication_slots.pl, it should be change here too
accordingly:
index 5834513add..815d1a7ca1 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -3,6 +3,9 @@
 PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
 PGAPPICON = win32

+# required for 003_logical_replication_slots.pl
+EXTRA_INSTALL=contrib/test_decoding
+

Regards,
Vignesh

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

19 октября 2023 г., 07:46:07

On Wednesday, October 18, 2023 5:26 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.

Thanks for updating the patch, here are few comments for the test.

1.

>
# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
# the same value (10) but PG12 and prior considered max_walsenders as a subset
# of max_connections, so setting the same value will fail.
if ($old_publisher->pg_version->major < 12)
{
    $old_publisher->append_conf(
        'postgresql.conf', qq[
    max_wal_senders = 5
    max_connections = 10
    ]);
>

I think we already set max_wal_senders to 5 in init() function(in Cluster.pm),
so is this necessary ? And 002_pg_upgrade.pl doesn't seems set this.

2.

        SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding', false, true);
        SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding', false, true);

I think we don't need to set the last two parameters here as we don't check
these info in the tests.

3.

# Set extra params if cross-version checks are required. This is needed to
# avoid using previously initdb'd cluster
if (defined($ENV{oldinstall}))
{
    my @initdb_params = ();
    push @initdb_params, ('--encoding', 'UTF-8');
    push @initdb_params, ('--locale', 'C');

I am not sure I understand the comment, would it be possible provide a bit more
explanation about the purpose of this setting ? And I see 002_pg_upgrade always
have these setting even if oldinstall is not defined, so shall we follow the
same ?

4.

+    command_ok(
+        [
+            'pg_upgrade', '--no-sync',
+            '-d', $old_publisher->data_dir,
+            '-D', $new_publisher->data_dir,
+            '-b', $oldbindir,
+            '-B', $newbindir,
+            '-s', $new_publisher->host,
+            '-p', $old_publisher->port,
+            '-P', $new_publisher->port,
+            $mode,
+        ],

I think all the pg_upgrade commands in the test are the same, so we can save the cmd
in a variable and pass them to command_xx(). I think it can save some effort to
check the difference of each command and can also reduce some codes.

Best Regards,
Hou zj

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Shlok Kyal

Дата:

19 октября 2023 г., 09:21:27

> Few comments:
> 1) We will be able to override the value of max_slot_wal_keep_size by
> using --new-options like '--new-options  "-c
> max_slot_wal_keep_size=val"':
> +       /*
> +        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
> +        * checkpointer process.  If WALs required by logical replication slots
> +        * are removed, the slots are unusable.  This setting prevents the
> +        * invalidation of slots during the upgrade. We set this option when
> +        * cluster is PG17 or later because logical replication slots
> can only be
> +        * migrated since then. Besides, max_slot_wal_keep_size is
> added in PG13.
> +        */
> +       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> +               appendPQExpBufferStr(&pgoptions, " -c
> max_slot_wal_keep_size=-1");
>
> Should there be a check to throw an error if this option is specified
> or do we need some documentation that this option should not be
> specified?

I have tested the above scenario. We are able to override the
max_slot_wal_keep_size by using  '--new-options  "-c
max_slot_wal_keep_size=val"'. And also with some insert statements
during pg_upgrade, old WAL file were deleted and logical replication
slots were invalidated. Since the slots were invalidated replication
was not happening after the upgrade.

Thanks,
Shlok Kumar Kyal

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

19 октября 2023 г., 09:28:12

On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ======
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE    *script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# ------------------------------
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
> > #    has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
> > #    still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.
> > +# And check the content. Both of slots must be reported that they have
> > +# unconsumed WALs after confirmed_flush_lsn.
> >
> > SUGGESTION
> > # Check the file content. Both slots should be reporting that they have
> > # unconsumed WAL records.
>
> Fixed.
>
> >
> > 8.
> > +# Preparations for the subsequent test:
> > +# 1. Setup logical replication
> > +my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
> > +
> > +$old_publisher->start;
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot1');");
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot2');");
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
> >
> >
> > 8a.
> > /Setup logical replication/Setup logical replication (first, cleanup
> > slots from the previous tests)/
>
> Fixed.
>
> > 8b.
> > Can't you combine all those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 9.
> > +
> > +# Actual run, successful upgrade is expected
> > +command_ok(
> > + [
> > + 'pg_upgrade', '--no-sync',
> > + '-d', $old_publisher->data_dir,
> > + '-D', $new_publisher->data_dir,
> > + '-b', $bindir,
> > + '-B', $bindir,
> > + '-s', $new_publisher->host,
> > + '-p', $old_publisher->port,
> > + '-P', $new_publisher->port,
> > + $mode,
> > + ],
> > + 'run of pg_upgrade of old cluster');
> >
> > Now that the "Dry run" part is removed, it seems unnecessary to say
> > "Actual run" for this part.
> >
> >
> > SUGGESTION
> > # pg_upgrade should be successful.
>
> Fixed.

Few comments:
1) Even if we comment 3rd point "Emit a non-transactional message",
test_slot2 still appears in the invalid_logical_replication_slots.txt
file. There is something wrong here.
+       # 2. Advance the slot test_slot2 up to the current WAL location, but
+       #        test_slot1 still has unconsumed WAL records.
+       $old_publisher->safe_psql('postgres',
+               "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+       # 3. Emit a non-transactional message. test_slot2 detects the message so
+       #        that this slot will be also reported by upcoming pg_upgrade.
+       $old_publisher->safe_psql('postgres',
+               "SELECT count(*) FROM pg_logical_emit_message('false',
'prefix', 'This is a non-transactional message');"
+       );

2) If the test fails here, it is difficult to debug as the
pg_upgrade_output.d directory was removed, so better to keep the
directory as it is this case:
+       # Check the file content. Both slots should be reporting that they have
+       # unconsumed WAL records.
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+
+       # Clean up
+       rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");

3) The below could be changed:
+       # Check the file content. Both slots should be reporting that they have
+       # unconsumed WAL records.
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');
+       like(
+               slurp_file($slots_filename),
+               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
+               'the previous test failed due to unconsumed WALs');

to:
my $result = slurp_file($slots_filename);
is( $result, qq(The slot "test_slot1" has not consumed the WAL yet
The slot "test_slot2" has not consumed the WAL yet
),
'the previous test failed due to unconsumed WALs');

Regards,
Vignesh

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Shlok Kyal

Дата:

19 октября 2023 г., 11:22:24

I tested a test scenario:
I started a new publisher with 'max_replication_slots' parameter set
to '1' and created a streaming replication with the new publisher as
primary node.
Then I did a pg_upgrade from old publisher to new publisher. The
upgrade failed with following error:

Restoring logical replication slots in the new cluster
SQL command failed
SELECT * FROM pg_catalog.pg_create_logical_replication_slot('test1',
'pgoutput', false, false);
ERROR:  all replication slots are in use
HINT:  Free one or increase max_replication_slots.

Failure, exiting

Should we document that the existing replication slots are taken in
consideration while setting 'max_replication_slots' value in the new
publisher?

Thanks
Shlok Kumar Kyal

On Wed, 18 Oct 2023 at 15:01, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ======
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE    *script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# ------------------------------
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor test_slot2
> > #    has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but test_slot2
> > #    still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.
> > +# And check the content. Both of slots must be reported that they have
> > +# unconsumed WALs after confirmed_flush_lsn.
> >
> > SUGGESTION
> > # Check the file content. Both slots should be reporting that they have
> > # unconsumed WAL records.
>
> Fixed.
>
> >
> > 8.
> > +# Preparations for the subsequent test:
> > +# 1. Setup logical replication
> > +my $old_connstr = $old_publisher->connstr . ' dbname=postgres';
> > +
> > +$old_publisher->start;
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot1');");
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT * FROM pg_drop_replication_slot('test_slot2');");
> > +
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE PUBLICATION regress_pub FOR ALL TABLES;");
> >
> >
> > 8a.
> > /Setup logical replication/Setup logical replication (first, cleanup
> > slots from the previous tests)/
>
> Fixed.
>
> > 8b.
> > Can't you combine all those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 9.
> > +
> > +# Actual run, successful upgrade is expected
> > +command_ok(
> > + [
> > + 'pg_upgrade', '--no-sync',
> > + '-d', $old_publisher->data_dir,
> > + '-D', $new_publisher->data_dir,
> > + '-b', $bindir,
> > + '-B', $bindir,
> > + '-s', $new_publisher->host,
> > + '-p', $old_publisher->port,
> > + '-P', $new_publisher->port,
> > + $mode,
> > + ],
> > + 'run of pg_upgrade of old cluster');
> >
> > Now that the "Dry run" part is removed, it seems unnecessary to say
> > "Actual run" for this part.
> >
> >
> > SUGGESTION
> > # pg_upgrade should be successful.
>
> Fixed.
>
> [1]:
https://www.postgresql.org/message-id/CAA4eK1%2BAHSWPs2_jn%3DftJKRqz-NXU6o%3DrPQ3f%3DH-gcPsgpPFrw%40mail.gmail.com
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 октября 2023 г., 12:24:04

Dear Shlok,

Thanks for testing the feature!

> 
> I tested a test scenario:
> I started a new publisher with 'max_replication_slots' parameter set
> to '1' and created a streaming replication with the new publisher as
> primary node.

Just to confirm what you did - you set up a physical replication and the
target of pg_upgrade was set to the primary, right?

I think we can assume that new cluster (target of pg_upgrade) is not used yet.
The documentation describes the usage [1] and it says that we must initialize
the cluster (at step 4) and then run the pg_upgrade (at step 10).

Therefore I don't think we should document anything about it.

[1]: https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=Initialize%20the%20new%20PostgreSQL%20cluster

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 октября 2023 г., 13:42:50

Dear Peter,

Thanks for reviewing! PSA new version.

> ======
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> + # 2. max_replication_slots is set to smaller than the number of slots (2)
> + # present on the old cluster
> 
> SUGGESTION
> 2. Set 'max_replication_slots' to be less than the number of slots (2)
> present on the old cluster.

Fixed.

> 2.
> + # Set max_replication_slots to the same value as the number of slots. Both
> + # of slots will be used for subsequent tests.
> 
> SUGGESTION
> Set 'max_replication_slots' to match the number of slots (2) present
> on the old cluster.
> Both slots will be used for subsequent tests.

Fixed.

> 
> 3.
> + # 3. Emit a non-transactional message. test_slot2 detects the message so
> + # that this slot will be also reported by upcoming pg_upgrade.
> + $old_publisher->safe_psql('postgres',
> + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');"
> + );
> 
> SUGGESTION
> 3. Emit a non-transactional message. This will cause test_slot2 to
> detect the unconsumed WAL record.

Fixed.

> 
> 4.
> + # Preparations for the subsequent test:
> + # 1. Generate extra WAL records. At this point neither test_slot1 nor
> + # test_slot2 has consumed them.
> + $old_publisher->start;
> + $old_publisher->safe_psql('postgres',
> + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> +
> + # 2. Advance the slot test_slot2 up to the current WAL location, but
> + # test_slot1 still has unconsumed WAL records.
> + $old_publisher->safe_psql('postgres',
> + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> +
> + # 3. Emit a non-transactional message. test_slot2 detects the message so
> + # that this slot will be also reported by upcoming pg_upgrade.
> + $old_publisher->safe_psql('postgres',
> + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');"
> + );
> +
> + $old_publisher->stop;
> 
> All of the above are sequentially executed on the
> old_publisher->safe_psql, so consider if it is worth combining them
> all in a single call (keeping the comments 1,2,3 separate still)
> 
> For example,
> 
> $old_publisher->start;
> $old_publisher->safe_psql('postgres', qq[
>   CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
>   SELECT pg_replication_slot_advance('test_slot2', NULL);
>   SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');
> ]);
> $old_publisher->stop;

Fixed.

> 
> 5.
> + # Clean up
> + $subscriber->stop();
> + $new_publisher->stop();
> 
> Should this also drop the 'test_slot1' and 'test_slot2'?

'test_slot1' and 'test_slot2' have already been removed while preparing in
"Successful upgrade" case. Also, I don't think objects have to be removed at the
end. It is tested by other parts, and it may make the test more difficult to
debug, if there are some failures.

> 6.
> +# Verify that logical replication slots cannot be migrated.  This function
> +# will be executed when the old cluster is PG16 and prior.
> +sub test_upgrade_from_pre_PG17
> +{
> + my ($old_publisher, $new_publisher, $mode) = @_;
> +
> + my $oldbindir = $old_publisher->config_data('--bindir');
> + my $newbindir = $new_publisher->config_data('--bindir');
> 
> SUGGESTION (let's not mention lots of different numbers; just refer to 17)
> This function will be executed when the old cluster version is prior to PG17.

Fixed.


> 7.
> + # Actual run, successful upgrade is expected
> + command_ok(
> + [
> + 'pg_upgrade', '--no-sync',
> + '-d', $old_publisher->data_dir,
> + '-D', $new_publisher->data_dir,
> + '-b', $oldbindir,
> + '-B', $newbindir,
> + '-s', $new_publisher->host,
> + '-p', $old_publisher->port,
> + '-P', $new_publisher->port,
> + $mode,
> + ],
> + 'run of pg_upgrade of old cluster');
> +
> + ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
> + "pg_upgrade_output.d/ removed after pg_upgrade success");
> 
> 7a.
> The comment is wrong?
> 
> SUGGESTION
> # pg_upgrade should NOT be successful

No, pg_uprade will success but no logical replication slots are migrated.
Comments docs were added.

> 7b.
> There is a blank line here before the ok() function, but in the other
> tests, there was none. Better to be consistent.

Removed.

> 8.
> + # Clean up
> + $new_publisher->stop();
> 
> Should this also drop the 'test_slot'?

I don't think so. Please see above.

> 
> 9.
> +# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections'
> to
> +# the same value (10) but PG12 and prior considered max_walsenders as a
> subset
> +# of max_connections, so setting the same value will fail.
> +if ($old_publisher->pg_version->major < 12)
> +{
> + $old_publisher->append_conf(
> + 'postgresql.conf', qq[
> + max_wal_senders = 5
> + max_connections = 10
> + ]);
> +}
> 
> If the comment is correct, then PG12 *and* prior, should be testing
> "<= 12", not "< 12". right?

I analyzed more and I was wrong - we must set GUCs here only for PG9.6-.
Regarding PG11 and PG10, the corresponding constructor will be chosen in new() [a],
and these instance will set max_wal_senders to 5 [b]. 
As for PG9.6-, the related package has not been defined yet so that such a
workaround will not be used. So we must set manually.

Actually, the part will be not needed when Cluster.pm supports PG9.6-. If needed
we can start another thread and support them. For now the case is handled ad-hoc.

> 10.
> +# Test according to the major version of the old cluster.
> +# Upgrading logical replication slots has been supported only since PG17.
> +if ($old_publisher->pg_version->major >= 17)
> 
> This comment seems wrong IMO. I think we always running the latest
> version of pg_upgrade so slot migration is always "supported" from now
> on. IIUC you intended this comment to be saying something about the
> old_publisher slots.
> 
> BEFORE
> Upgrading logical replication slots has been supported only since PG17.
> 
> SUGGESTION
> Upgrading logical replication slots from versions older than PG17 is
> not supported.

Fixed.

[a]:
```
    # Use a subclass as defined below (or elsewhere) if this version
    # isn't fully compatible. Warn if the version is too old and thus we don't
    # have a subclass of this class.
    if (ref $ver && $ver < $min_compat)
    {
        my $maj = $ver->major(separator => '_');
        my $subclass = $class . "::V_$maj";
        if ($subclass->isa($class))
        {
            bless $node, $subclass;
        }
```

[b]:
```
sub init
{
    my ($self, %params) = @_;
    $self->SUPER::init(%params);
    $self->adjust_conf('postgresql.conf', 'max_wal_senders',
        $params{allows_streaming} ? 5 : 0);
}
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v53-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 октября 2023 г., 13:43:57

Dear Vignesh,

Thanks for reviewing! New patch can be available in [1].

> 
> Few comments:
> 1) We will be able to override the value of max_slot_wal_keep_size by
> using --new-options like '--new-options  "-c
> max_slot_wal_keep_size=val"':
> +       /*
> +        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
> +        * checkpointer process.  If WALs required by logical replication slots
> +        * are removed, the slots are unusable.  This setting prevents the
> +        * invalidation of slots during the upgrade. We set this option when
> +        * cluster is PG17 or later because logical replication slots
> can only be
> +        * migrated since then. Besides, max_slot_wal_keep_size is
> added in PG13.
> +        */
> +       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> +               appendPQExpBufferStr(&pgoptions, " -c
> max_slot_wal_keep_size=-1");
> 
> Should there be a check to throw an error if this option is specified
> or do we need some documentation that this option should not be
> specified?

Hmm, I don't think we have to add checks. Other settings, like synchronous_commit
and fsync, can be also overwritten, but pg_upgrade has never checked. Therefore,
it's user's responsibility to not set max_slot_wal_keep_size to a dangerous
value.

> 2) Because we are able to override max_slot_wal_keep_size there is a
> chance of slot getting invalidated and Assert being hit:
> +               /*
> +                * The logical replication slots shouldn't be invalidated as
> +                * max_slot_wal_keep_size GUC is set to -1 during the
> upgrade.
> +                *
> +                * The following is just a sanity check.
> +                */
> +               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> +               {
> +                       Assert(max_slot_wal_keep_size_mb == -1);
> +                       elog(ERROR, "replication slots must not be
> invalidated during the upgrade");
> +               }

Hmm, so how about removing an assert and changing the error message more
appropriate? I still think it seldom occurs.

> 3) File 003_logical_replication_slots.pl is now changed to
> 003_upgrade_logical_replication_slots.pl, it should be change here too
> accordingly:
> index 5834513add..815d1a7ca1 100644
> --- a/src/bin/pg_upgrade/Makefile
> +++ b/src/bin/pg_upgrade/Makefile
> @@ -3,6 +3,9 @@
>  PGFILEDESC = "pg_upgrade - an in-place binary upgrade utility"
>  PGAPPICON = win32
> 
> +# required for 003_logical_replication_slots.pl
> +EXTRA_INSTALL=contrib/test_decoding
> +

Fixed.

[1]:
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 октября 2023 г., 13:44:27

Dear Hou,

Thanks for reviewing! New patch can be available in [1].

> Thanks for updating the patch, here are few comments for the test.
> 
> 1.
> 
> >
> # The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections'
> to
> # the same value (10) but PG12 and prior considered max_walsenders as a subset
> # of max_connections, so setting the same value will fail.
> if ($old_publisher->pg_version->major < 12)
> {
>     $old_publisher->append_conf(
>         'postgresql.conf', qq[
>     max_wal_senders = 5
>     max_connections = 10
>     ]);
> >
> 
> I think we already set max_wal_senders to 5 in init() function(in Cluster.pm),
> so is this necessary ? And 002_pg_upgrade.pl doesn't seems set this.

I thought you mentioned about Cluster::V_11::init(). I analyzed based on that and
found a fault. Could you please check [1]?

> 2.
> 
>         SELECT pg_create_logical_replication_slot('test_slot1',
> 'test_decoding', false, true);
>         SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);
> 
> I think we don't need to set the last two parameters here as we don't check
> these info in the tests.

Removed.

> 3.
> 
> # Set extra params if cross-version checks are required. This is needed to
> # avoid using previously initdb'd cluster
> if (defined($ENV{oldinstall}))
> {
>     my @initdb_params = ();
>     push @initdb_params, ('--encoding', 'UTF-8');
>     push @initdb_params, ('--locale', 'C');
> 
> I am not sure I understand the comment, would it be possible provide a bit more
> explanation about the purpose of this setting ? And I see 002_pg_upgrade always
> have these setting even if oldinstall is not defined, so shall we follow the
> same ?

Fixed.
Actually settings are not needed for new cluster, but seems better to follow 002.

> 4.
> 
> +    command_ok(
> +        [
> +            'pg_upgrade', '--no-sync',
> +            '-d', $old_publisher->data_dir,
> +            '-D', $new_publisher->data_dir,
> +            '-b', $oldbindir,
> +            '-B', $newbindir,
> +            '-s', $new_publisher->host,
> +            '-p', $old_publisher->port,
> +            '-P', $new_publisher->port,
> +            $mode,
> +        ],
> 
> I think all the pg_upgrade commands in the test are the same, so we can save the
> cmd
> in a variable and pass them to command_xx(). I think it can save some effort to
> check the difference of each command and can also reduce some codes.

Fixed.

[1]:
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 октября 2023 г., 13:45:26

Dear Shlok,

> 
> I have tested the above scenario. We are able to override the
> max_slot_wal_keep_size by using  '--new-options  "-c
> max_slot_wal_keep_size=val"'. And also with some insert statements
> during pg_upgrade, old WAL file were deleted and logical replication
> slots were invalidated. Since the slots were invalidated replication
> was not happening after the upgrade.

Yeah, theoretically it could be overwritten, but I still think we do not have to
guard. Also, connections must not be established during the upgrade [1].
I improved the ereport() message in the new patch[2]. How do you think?

[1]: https://www.postgresql.org/message-id/ZNZ4AxUMIrnMgRbo%40momjian.us
[2]:
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 октября 2023 г., 13:45:53

Dear Vignesh,

Thanks for revieing! New patch can be available in [1].

> Few comments:
> 1) Even if we comment 3rd point "Emit a non-transactional message",
> test_slot2 still appears in the invalid_logical_replication_slots.txt
> file. There is something wrong here.
> +       # 2. Advance the slot test_slot2 up to the current WAL location, but
> +       #        test_slot1 still has unconsumed WAL records.
> +       $old_publisher->safe_psql('postgres',
> +               "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> +
> +       # 3. Emit a non-transactional message. test_slot2 detects the message
> so
> +       #        that this slot will be also reported by upcoming pg_upgrade.
> +       $old_publisher->safe_psql('postgres',
> +               "SELECT count(*) FROM pg_logical_emit_message('false',
> 'prefix', 'This is a non-transactional message');"
> +       );

The comment was updated based on others. How do you think?

> 2) If the test fails here, it is difficult to debug as the
> pg_upgrade_output.d directory was removed, so better to keep the
> directory as it is this case:
> +       # Check the file content. Both slots should be reporting that they have
> +       # unconsumed WAL records.
> +       like(
> +               slurp_file($slots_filename),
> +               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
> +               'the previous test failed due to unconsumed WALs');
> +       like(
> +               slurp_file($slots_filename),
> +               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
> +               'the previous test failed due to unconsumed WALs');
> +
> +       # Clean up
> +       rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");

Right. Current style just follows the 002 test. I removed rmtree().

> 3) The below could be changed:
> +       # Check the file content. Both slots should be reporting that they have
> +       # unconsumed WAL records.
> +       like(
> +               slurp_file($slots_filename),
> +               qr/The slot \"test_slot1\" has not consumed the WAL yet/m,
> +               'the previous test failed due to unconsumed WALs');
> +       like(
> +               slurp_file($slots_filename),
> +               qr/The slot \"test_slot2\" has not consumed the WAL yet/m,
> +               'the previous test failed due to unconsumed WALs');
> 
> to:
> my $result = slurp_file($slots_filename);
> is( $result, qq(The slot "test_slot1" has not consumed the WAL yet
> The slot "test_slot2" has not consumed the WAL yet
> ),
> 'the previous test failed due to unconsumed WALs');
>

Replaced, but the formatting seems not good. I wanted to hear opinions from others.

[1]:
https://www.postgresql.org/message-id/TYCPR01MB587007EA2F9AB92F0E1F5957F5D4A%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

19 октября 2023 г., 15:53:51

Dear hackers,

> Thanks for reviewing! PSA new version.

Hmm. The cfbot got angry, whereas it can pass on my machine.
It seems that the ordering in invalid_logical_replication_slots.txt is not fixed.

A change for checking the content was reverted. It could pass on my CI.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v54-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

20 октября 2023 г., 04:49:59

Here are some review comments for v54-0001

======
src/backend/replication/slot.c

1.
+ if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
+ {
+ ereport(ERROR,
+ errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("replication slots must not be invalidated during the upgrade"),
+ errhint("\"max_slot_wal_keep_size\" must not be set to -1 during the
upgrade"));
+ }

This new error is replacing the old code:
+ Assert(max_slot_wal_keep_size_mb == -1);

Is that errhint correct? Shouldn't it say "must" instead of "must not"?

======
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

2. General formating

Some of the "]);" formatting and indenting for the multiple SQL
commands is inconsistent.

For example,

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ SELECT pg_create_logical_replication_slot('test_slot1', 'test_decoding');
+ SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding');
+ ]
+ );

versus

+ $old_publisher->safe_psql(
+ 'postgres', qq[
+ CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
+ SELECT pg_replication_slot_advance('test_slot2', NULL);
+ SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
+ ]);

~~~

3.
+# Set up some settings for the old cluster, so that we can ensures that initdb
+# will be done.
+my @initdb_params = ();
+push @initdb_params, ('--encoding', 'UTF-8');
+push @initdb_params, ('--locale', 'C');
+$node_params{extra} = \@initdb_params;
+
+$old_publisher->init(%node_params);

Why would initdb not be done if these were not set? I didn't
understand the comment.

/so that we can ensures/to ensure/

~~~

4.
+# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns 'max_wal_senders' and
+# 'max_connections' to the same value (10). But these versions considered
+# max_wal_senders as a subset of max_connections, so setting the same value
+# will fail. This adjustment will not be needed when packages for older
+#versions are defined.
+if ($old_publisher->pg_version->major <= 9.6)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

4a.
IMO remove the complicated comment trying to explain the problem and
just to unconditionally set the values you want.

SUGGESTION#1
# Older PG version had different rules for the inter-dependency of
'max_wal_senders' and 'max_connections',
# so assign values which will work for all PG versions.
$old_publisher->append_conf(
  'postgresql.conf', qq[
  max_wal_senders = 5
  max_connections = 10
  ]);

~~

4b.
If you really want to put special code here then I think the comment
needs to be more descriptive like below. IMO this suggestion is
overkill, #4a above is much simpler.

SUGGESTION#2
# Versions prior to PG12 considered max_walsenders as a subset
max_connections, so setting the same value will fail.
#
# The TAP Cluster.pm assigns default 'max_wal_senders' and
'max_connections' as follows:
# PG_11:  'max_wal_senders=5' and 'max_connections=10'
# PG_10:  'max_wal_senders=5' and 'max_connections=10'
# Everything else: 'max_wal_senders=10' and 'max_connections=10'
#
# The following code is needed to make adjustments for versions not
already being handled by Cluster.pm.

~

4c.
Alternatively, make necessary adjustments in the Cluster.pm to set
appropriate defaults for all older versions. Then probably you can
remove all this code entirely.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

20 октября 2023 г., 06:19:08

On Thu, 19 Oct 2023 at 16:14, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thanks for reviewing! New patch can be available in [1].
>
> >
> > Few comments:
> > 1) We will be able to override the value of max_slot_wal_keep_size by
> > using --new-options like '--new-options  "-c
> > max_slot_wal_keep_size=val"':
> > +       /*
> > +        * Use max_slot_wal_keep_size as -1 to prevent the WAL removal by the
> > +        * checkpointer process.  If WALs required by logical replication slots
> > +        * are removed, the slots are unusable.  This setting prevents the
> > +        * invalidation of slots during the upgrade. We set this option when
> > +        * cluster is PG17 or later because logical replication slots
> > can only be
> > +        * migrated since then. Besides, max_slot_wal_keep_size is
> > added in PG13.
> > +        */
> > +       if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> > +               appendPQExpBufferStr(&pgoptions, " -c
> > max_slot_wal_keep_size=-1");
> >
> > Should there be a check to throw an error if this option is specified
> > or do we need some documentation that this option should not be
> > specified?
>
> Hmm, I don't think we have to add checks. Other settings, like synchronous_commit
> and fsync, can be also overwritten, but pg_upgrade has never checked. Therefore,
> it's user's responsibility to not set max_slot_wal_keep_size to a dangerous
> value.
>
> > 2) Because we are able to override max_slot_wal_keep_size there is a
> > chance of slot getting invalidated and Assert being hit:
> > +               /*
> > +                * The logical replication slots shouldn't be invalidated as
> > +                * max_slot_wal_keep_size GUC is set to -1 during the
> > upgrade.
> > +                *
> > +                * The following is just a sanity check.
> > +                */
> > +               if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade)
> > +               {
> > +                       Assert(max_slot_wal_keep_size_mb == -1);
> > +                       elog(ERROR, "replication slots must not be
> > invalidated during the upgrade");
> > +               }
>
> Hmm, so how about removing an assert and changing the error message more
> appropriate? I still think it seldom occurs.

As this scenario can occur by overriding max_slot_wal_keep_size, it is
better to remove the Assert.

Regards,
Vignesh

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

20 октября 2023 г., 06:24:23

On Thu, 19 Oct 2023 at 16:16, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Vignesh,
>
> Thanks for revieing! New patch can be available in [1].
>
> > Few comments:
> > 1) Even if we comment 3rd point "Emit a non-transactional message",
> > test_slot2 still appears in the invalid_logical_replication_slots.txt
> > file. There is something wrong here.
> > +       # 2. Advance the slot test_slot2 up to the current WAL location, but
> > +       #        test_slot1 still has unconsumed WAL records.
> > +       $old_publisher->safe_psql('postgres',
> > +               "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +       # 3. Emit a non-transactional message. test_slot2 detects the message
> > so
> > +       #        that this slot will be also reported by upcoming pg_upgrade.
> > +       $old_publisher->safe_psql('postgres',
> > +               "SELECT count(*) FROM pg_logical_emit_message('false',
> > 'prefix', 'This is a non-transactional message');"
> > +       );
>
> The comment was updated based on others. How do you think?

I mean if we comment or remove this statement like in the attached
patch, the test is still passing with 'The slot "test_slot2" has not
consumed the WAL yet', in this case should the test_slot2 be still
invalid as we have called pg_replication_slot_advance for test_slot2.

Regards,
Vignesh

Вложения

test_issue.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

20 октября 2023 г., 18:20:51

On Friday, October 20, 2023 9:50 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Here are some review comments for v54-0001

Thanks for the review.

> 
> ======
> src/backend/replication/slot.c
> 
> 1.
> + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) {
> + ereport(ERROR, errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("replication slots must not be invalidated during the
> + upgrade"), errhint("\"max_slot_wal_keep_size\" must not be set to -1
> + during the
> upgrade"));
> + }
> 
> This new error is replacing the old code:
> + Assert(max_slot_wal_keep_size_mb == -1);
> 
> Is that errhint correct? Shouldn't it say "must" instead of "must not"?

Fixed.

> 
> ======
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 2. General formating
> 
> Some of the "]);" formatting and indenting for the multiple SQL commands is
> inconsistent.
> 
> For example,
> 
> + $old_publisher->safe_psql(
> + 'postgres', qq[
> + SELECT pg_create_logical_replication_slot('test_slot1',
> + 'test_decoding'); SELECT
> + pg_create_logical_replication_slot('test_slot2', 'test_decoding'); ]
> + );
> 
> versus
> 
> + $old_publisher->safe_psql(
> + 'postgres', qq[
> + CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a; SELECT
> + pg_replication_slot_advance('test_slot2', NULL); SELECT count(*) FROM
> + pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');
> + ]);
> 

Fixed.

> ~~~
> 
> 3.
> +# Set up some settings for the old cluster, so that we can ensures that
> +initdb # will be done.
> +my @initdb_params = ();
> +push @initdb_params, ('--encoding', 'UTF-8'); push @initdb_params,
> +('--locale', 'C'); $node_params{extra} = \@initdb_params;
> +
> +$old_publisher->init(%node_params);
> 
> Why would initdb not be done if these were not set? I didn't understand the
> comment.
> 
> /so that we can ensures/to ensure/

The node->init() will use a previously initialized cluster if no parameter was
specified, but that cluster could be of wrong version when doing cross-version
test, so we set something to let the initdb happen.

I added some explanation in the comment.

> ~~~
> 
> 4.
> +# XXX: For PG9.6 and prior, the TAP Cluster.pm assigns
> +'max_wal_senders' and # 'max_connections' to the same value (10). But
> +these versions considered # max_wal_senders as a subset of
> +max_connections, so setting the same value # will fail. This adjustment
> +will not be needed when packages for older #versions are defined.
> +if ($old_publisher->pg_version->major <= 9.6) {
> +$old_publisher->append_conf(  'postgresql.conf', qq[  max_wal_senders =
> +5  max_connections = 10  ]); }
> 
> 4a.
> IMO remove the complicated comment trying to explain the problem and just
> to unconditionally set the values you want.
> 
> SUGGESTION#1
> # Older PG version had different rules for the inter-dependency of
> 'max_wal_senders' and 'max_connections', # so assign values which will work
> for all PG versions.
> $old_publisher->append_conf(
>   'postgresql.conf', qq[
>   max_wal_senders = 5
>   max_connections = 10
>   ]);
> 
> ~~

As Kuroda-san mentioned, we may fix Cluster.pm later, so I kept the XXX comment
but simplify it based on your suggestion.

Attach the new version patch.

Best Regards,
Hou zj

Вложения

v55-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

20 октября 2023 г., 18:21:23

On Friday, October 20, 2023 11:24 AM vignesh C <vignesh21@gmail.com> wrote:
> 
> On Thu, 19 Oct 2023 at 16:16, Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Vignesh,
> >
> > Thanks for revieing! New patch can be available in [1].
> >
> > > Few comments:
> > > 1) Even if we comment 3rd point "Emit a non-transactional message",
> > > test_slot2 still appears in the
> > > invalid_logical_replication_slots.txt
> > > file. There is something wrong here.
> > > +       # 2. Advance the slot test_slot2 up to the current WAL location,
> but
> > > +       #        test_slot1 still has unconsumed WAL records.
> > > +       $old_publisher->safe_psql('postgres',
> > > +               "SELECT pg_replication_slot_advance('test_slot2',
> > > + NULL);");
> > > +
> > > +       # 3. Emit a non-transactional message. test_slot2 detects
> > > + the message
> > > so
> > > +       #        that this slot will be also reported by upcoming
> pg_upgrade.
> > > +       $old_publisher->safe_psql('postgres',
> > > +               "SELECT count(*) FROM
> > > + pg_logical_emit_message('false',
> > > 'prefix', 'This is a non-transactional message');"
> > > +       );
> >
> > The comment was updated based on others. How do you think?
> 
> I mean if we comment or remove this statement like in the attached patch, the
> test is still passing with 'The slot "test_slot2" has not consumed the WAL yet', in
> this case should the test_slot2 be still invalid as we have called
> pg_replication_slot_advance for test_slot2.

It's because we pass NULL to pg_replication_slot_advance(). We should pass 
pg_current_wal_lsn() instead. I have fixed it in V55 version.

Best Regards,
Hou zj

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

21 октября 2023 г., 03:11:46

On Fri, Oct 20, 2023 at 8:51 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Attach the new version patch.

Thanks. Here are some comments on v55 patch:

1. A nit:
+
+    /*
+     * We also skip decoding in 'fast_forward' mode. In passing set the
+     * 'processing_required' flag to indicate, were it not for this mode,
+     * processing *would* have been required.
+     */
How about "We also skip decoding in fast_forward mode. In passing set
the processing_required flag to indicate that if it were not for
fast_forward mode, processing would have been required."?

2. Don't we need InvalidateSystemCaches() after FreeDecodingContext()?

+    /* Clean up */
+    FreeDecodingContext(ctx);

3. Don't we need to put CreateDecodingContext in PG_TRY-PG_CATCH with
InvalidateSystemCaches() in PG_CATCH block? I think we need to clear
all timetravel entries with InvalidateSystemCaches(), no?

4. The following assertion better be an error? Or we ensure that
binary_upgrade_slot_has_caught_up isn't called for an invalidated slot
at all?
+
+    /* Slots must be valid as otherwise we won't be able to scan the WAL */
+    Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);

5. This better be an error instead of returning false? IMO, null value
for slot name is an error.
+    /* Quick exit if the input is NULL */
+    if (PG_ARGISNULL(0))
+        PG_RETURN_BOOL(false);

6. A nit: how about is_decodable_txn or is_decodable_change or some
other instead of just a plain name processing_required?
+    /* Do we need to process any change in 'fast_forward' mode? */
+    bool        processing_required;

7. Can the following pg_fatal message be consistent and start with
lowercase letter something like "expected 0 logical replication slots
...."?
+        pg_fatal("Expected 0 logical replication slots but found %d.",
+                 nslots_on_new);

8. s/problem/problematic - "A list of problematic slots is in the file:\n"
+                 "A list of the problem slots is in the file:\n"

9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
better, meaningful and consistent despite a bit long than just
binary_upgrade_slot_has_caught_up.

10. How about an assert that the passed-in replication slot is logical
in binary_upgrade_slot_has_caught_up?

11. How about adding CheckLogicalDecodingRequirements too in
binary_upgrade_slot_has_caught_up after CheckSlotPermissions just in
case?

12. Not necessary but adding ReplicationSlotValidateName(slot_name,
ERROR); for the passed-in slotname in
binary_upgrade_slot_has_caught_up may be a good idea, at least in
assert builds to help with input validations.

13. Can the functionality of LogicalReplicationSlotHasPendingWal be
moved to binary_upgrade_slot_has_caught_up and get rid of a separate
function LogicalReplicationSlotHasPendingWal? Or is it that the
function exists in logical.c to avoid extra dependencies between
logical.c and pg_upgrade_support.c?

14. I think it's better to check if the old cluster contains the
necessary function binary_upgrade_slot_has_caught_up instead of just
relying on major version.
+    /* Logical slots can be migrated since PG17. */
+    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+        return;

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

23 октября 2023 г., 08:39:59

Dear Bharath,

Thank you for reviewing! PSA new version.

> 1. A nit:
> +
> +    /*
> +     * We also skip decoding in 'fast_forward' mode. In passing set the
> +     * 'processing_required' flag to indicate, were it not for this mode,
> +     * processing *would* have been required.
> +     */
> How about "We also skip decoding in fast_forward mode. In passing set
> the processing_required flag to indicate that if it were not for
> fast_forward mode, processing would have been required."?

Fixed.

> 2. Don't we need InvalidateSystemCaches() after FreeDecodingContext()?
> 
> +    /* Clean up */
> +    FreeDecodingContext(ctx);

Right. Older system caches should be thrown away here for upcoming pg_dump.

> 3. Don't we need to put CreateDecodingContext in PG_TRY-PG_CATCH with
> InvalidateSystemCaches() in PG_CATCH block? I think we need to clear
> all timetravel entries with InvalidateSystemCaches(), no?

Added.

> 4. The following assertion better be an error? Or we ensure that
> binary_upgrade_slot_has_caught_up isn't called for an invalidated slot
> at all?
> +
> +    /* Slots must be valid as otherwise we won't be able to scan the WAL */
> +    Assert(MyReplicationSlot->data.invalidated == RS_INVAL_NONE);

I kept the Assert() because pg_upgrade won't call this function for invalidated
slots.

> 5. This better be an error instead of returning false? IMO, null value
> for slot name is an error.
> +    /* Quick exit if the input is NULL */
> +    if (PG_ARGISNULL(0))
> +        PG_RETURN_BOOL(false);

Hmm, OK, changed to elog(ERROR).
If current style is kept and NULL were to input, an empty string may be reported
as slotname in invalid_logical_replication_slots.txt. It is quite strange. Note
again that it won't be expected.

> 6. A nit: how about is_decodable_txn or is_decodable_change or some
> other instead of just a plain name processing_required?
> +    /* Do we need to process any change in 'fast_forward' mode? */
> +    bool        processing_required;

I preferred current one. Because not only decodable txn, non-txn change and
empty transactions also be processed.

> 7. Can the following pg_fatal message be consistent and start with
> lowercase letter something like "expected 0 logical replication slots
> ...."?
> +        pg_fatal("Expected 0 logical replication slots but found %d.",
> +                 nslots_on_new);

Note that the Upper/Lower case rule has been broken in this file. Lower case was
used here because I regarded this sentence as hint message. Please see previous
posts [1] [2].


> 8. s/problem/problematic - "A list of problematic slots is in the file:\n"
> +                 "A list of the problem slots is in the file:\n"

Fixed.

> 9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
> better, meaningful and consistent despite a bit long than just
> binary_upgrade_slot_has_caught_up.

Fixed.

> 10. How about an assert that the passed-in replication slot is logical
> in binary_upgrade_slot_has_caught_up?

Fixed.

> 11. How about adding CheckLogicalDecodingRequirements too in
> binary_upgrade_slot_has_caught_up after CheckSlotPermissions just in
> case?

Not added. CheckLogicalDecodingRequirements() ensures that WALs can be decodable
and the changes can be applied, but both of them are not needed for fast_forward
mode. Also, pre-existing function pg_logical_replication_slot_advance() does not
call it.

> 12. Not necessary but adding ReplicationSlotValidateName(slot_name,
> ERROR); for the passed-in slotname in
> binary_upgrade_slot_has_caught_up may be a good idea, at least in
> assert builds to help with input validations.

Not added because ReplicationSlotAcquire() can report even if invalid name is
added. Also, pre-existing function pg_logical_replication_slot_advance() does not
call it.

> 13. Can the functionality of LogicalReplicationSlotHasPendingWal be
> moved to binary_upgrade_slot_has_caught_up and get rid of a separate
> function LogicalReplicationSlotHasPendingWal? Or is it that the
> function exists in logical.c to avoid extra dependencies between
> logical.c and pg_upgrade_support.c?

I kept current style. I think upgrade functions should be short so that actual
tasks should be done in other place. SetAttrMissing() is called only from an
upgrading function, so we do not have a policy to avoid deviding function.
Also, LogicalDecodingProcessRecord() is called from only files in src/backend/replication,
so we can keep them.

> 14. I think it's better to check if the old cluster contains the
> necessary function binary_upgrade_slot_has_caught_up instead of just
> relying on major version.
> +    /* Logical slots can be migrated since PG17. */
> +    if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> +        return;

I kept current style because I could not find a merit for the approach. If the
patch is committed PG17.X surely has binary_upgrade_logical_replication_slot_has_caught_up().
Also, other upgrading function are not checked from the pg_proc catalog. If you
have some other things in your mind, please reply here.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB586642D33208D190F67CDD7BF5F2A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYAPR01MB58666936A0DB0EEDCC929CEEF5FEA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v56-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

23 октября 2023 г., 11:30:00

On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Thank you for reviewing! PSA new version.

> > 6. A nit: how about is_decodable_txn or is_decodable_change or some
> > other instead of just a plain name processing_required?
> > +    /* Do we need to process any change in 'fast_forward' mode? */
> > +    bool        processing_required;
>
> I preferred current one. Because not only decodable txn, non-txn change and
> empty transactions also be processed.

Right. It's not the txn, but the change. processing_required seems too
generic IMV. A nit: is_change_decodable or something?

Thanks for the patch. Here are few comments on v56 patch:

1.
+ *
+ * Although this function is currently used only during pg_upgrade, there are
+ * no reasons to restrict it, so IsBinaryUpgrade is not checked here.

This comment isn't required IMV, because anyone looking at the code
and callsites can understand it.

2. A nit: IMV "This is a special purpose ..." statement seems redundant.
+ *
+ * This is a special purpose function to ensure that the given slot can be
+ * upgraded without data loss.

How about

Verify that the given replication slot has consumed all the WAL changes.
If there's any decodable WAL record after the slot's
confirmed_flush_lsn, the slot's consumer will lose that data after the
slot is upgraded.
Returns true if there are no decodable WAL records after the
confirmed_flush_lsn. Otherwise false.

3.
+    if (PG_ARGISNULL(0))
+        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

I can see the above style is referenced from
binary_upgrade_create_empty_extension, but IMV the following looks
better and latest (ereport is new style than elog)

        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("replication slot name cannot be null")));

4. The following comment seems frivolous, the code tells it all.
Please remove the comment.
+
+                /* No need to check this slot, seek to new one */
+                continue;

5. A typo - s/gets/Gets
+ * gets the LogicalSlotInfos for all the logical replication slots of the

6. An optimization in count_old_cluster_logical_slots(void): Turn
slot_count to a function static variable so that the for loop isn't
required every time because the slot count is prepared in
get_old_cluster_logical_slot_infos only once and won't change later
on. Do you see any problem with the following? This saves a few CPU
cycles when there are large number of replication slots.
{
    static int slot_count = 0;
    static bool first_time = true;

    if (first_time)
    {
        for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
            slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;

        first_time = false;
    }

    return slot_count;
}

7. A typo: s/slotname/slot name. "slot name" looks better in user
visible messages.
+        pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",

8.
+else
+{
+    test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
+        @pg_upgrade_cmd);
+}
Will this ever be tested in current TAP test framework? I mean, will
the TAP test framework allow testing upgrades from one PG version to
another PG version?

9. A nit: Can single quotes around variable names in the comments be
removed just to be consistent?
+     * We also skip decoding in 'fast_forward' mode. This check must be last
+    /* Do we need to process any change in 'fast_forward' mode? */

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

24 октября 2023 г., 07:16:37

On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Thank you for reviewing! PSA new version.
>
> > > 6. A nit: how about is_decodable_txn or is_decodable_change or some
> > > other instead of just a plain name processing_required?
> > > +    /* Do we need to process any change in 'fast_forward' mode? */
> > > +    bool        processing_required;
> >
> > I preferred current one. Because not only decodable txn, non-txn change and
> > empty transactions also be processed.
>
> Right. It's not the txn, but the change. processing_required seems too
> generic IMV. A nit: is_change_decodable or something?
>

If we don't want to keep it generic then we should use something like
'contains_decodable_change'. 'is_change_decodable' could have suited
here if we were checking a particular change.

> Thanks for the patch. Here are few comments on v56 patch:
>
> 1.
> + *
> + * Although this function is currently used only during pg_upgrade, there are
> + * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
>
> This comment isn't required IMV, because anyone looking at the code
> and callsites can understand it.
>
> 2. A nit: IMV "This is a special purpose ..." statement seems redundant.
> + *
> + * This is a special purpose function to ensure that the given slot can be
> + * upgraded without data loss.
>
> How about
>
> Verify that the given replication slot has consumed all the WAL changes.
> If there's any decodable WAL record after the slot's
> confirmed_flush_lsn, the slot's consumer will lose that data after the
> slot is upgraded.
> Returns true if there are no decodable WAL records after the
> confirmed_flush_lsn. Otherwise false.
>

Personally, I find the current comment succinct and clear.

> 3.
> +    if (PG_ARGISNULL(0))
> +        elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
>
> I can see the above style is referenced from
> binary_upgrade_create_empty_extension, but IMV the following looks
> better and latest (ereport is new style than elog)
>
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                  errmsg("replication slot name cannot be null")));
>

Do you have any theory for making elog to ereport? I am not completely
sure but as this and related function is used internally, so using
elog seems reasonable. Also, I find keeping it consistent with the
existing error message is also reasonable. We can change both later
together if we get a broader agreement.

> 4. The following comment seems frivolous, the code tells it all.
> Please remove the comment.
> +
> +                /* No need to check this slot, seek to new one */
> +                continue;
>
> 5. A typo - s/gets/Gets
> + * gets the LogicalSlotInfos for all the logical replication slots of the
>
> 6. An optimization in count_old_cluster_logical_slots(void): Turn
> slot_count to a function static variable so that the for loop isn't
> required every time because the slot count is prepared in
> get_old_cluster_logical_slot_infos only once and won't change later
> on. Do you see any problem with the following? This saves a few CPU
> cycles when there are large number of replication slots.
> {
>     static int slot_count = 0;
>     static bool first_time = true;
>
>     if (first_time)
>     {
>         for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
>             slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
>
>         first_time = false;
>     }
>
>     return slot_count;
> }
>

This may not be a problem but this is also not a function that will be
used frequently. I am not sure if adding such code optimizations is
worth it.

> 7. A typo: s/slotname/slot name. "slot name" looks better in user
> visible messages.
> +        pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %s",
>

If we want to follow other parameters then we can even use slot_name.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

24 октября 2023 г., 07:21:08

On Sat, Oct 21, 2023 at 5:41 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Oct 20, 2023 at 8:51 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:

>
> 9. IMO, binary_upgrade_logical_replication_slot_has_caught_up seems
> better, meaningful and consistent despite a bit long than just
> binary_upgrade_slot_has_caught_up.
>

I think logical_replication is specific to our pub-sub model but we
can have manually created slots as well. So, it would be better to
name it as binary_upgrade_logical_slot_has_caught_up().

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

24 октября 2023 г., 09:02:21

Dear Bharath, Amit,

Thanks for reviewing! PSA new version.
I addressed comments which have not been claimed.

> On Mon, Oct 23, 2023 at 2:00 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Oct 23, 2023 at 11:10 AM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Thank you for reviewing! PSA new version.
> >
> > > > 6. A nit: how about is_decodable_txn or is_decodable_change or some
> > > > other instead of just a plain name processing_required?
> > > > +    /* Do we need to process any change in 'fast_forward' mode? */
> > > > +    bool        processing_required;
> > >
> > > I preferred current one. Because not only decodable txn, non-txn change and
> > > empty transactions also be processed.
> >
> > Right. It's not the txn, but the change. processing_required seems too
> > generic IMV. A nit: is_change_decodable or something?
> >
> 
> If we don't want to keep it generic then we should use something like
> 'contains_decodable_change'. 'is_change_decodable' could have suited
> here if we were checking a particular change.

I kept the name for now. How does Bharath think?

> > Thanks for the patch. Here are few comments on v56 patch:
> >
> > 1.
> > + *
> > + * Although this function is currently used only during pg_upgrade, there are
> > + * no reasons to restrict it, so IsBinaryUpgrade is not checked here.
> >
> > This comment isn't required IMV, because anyone looking at the code
> > and callsites can understand it.

Removed.

> > 2. A nit: IMV "This is a special purpose ..." statement seems redundant.
> > + *
> > + * This is a special purpose function to ensure that the given slot can be
> > + * upgraded without data loss.
> >
> > How about
> >
> > Verify that the given replication slot has consumed all the WAL changes.
> > If there's any decodable WAL record after the slot's
> > confirmed_flush_lsn, the slot's consumer will lose that data after the
> > slot is upgraded.
> > Returns true if there are no decodable WAL records after the
> > confirmed_flush_lsn. Otherwise false.
> >
> 
> Personally, I find the current comment succinct and clear.

I kept current one.

> > 3.
> > +    if (PG_ARGISNULL(0))
> > +        elog(ERROR, "null argument to
> > binary_upgrade_validate_wal_records is not allowed");
> >
> > I can see the above style is referenced from
> > binary_upgrade_create_empty_extension, but IMV the following looks
> > better and latest (ereport is new style than elog)
> >
> >         ereport(ERROR,
> >                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >                  errmsg("replication slot name cannot be null")));
> >
> 
> Do you have any theory for making elog to ereport? I am not completely
> sure but as this and related function is used internally, so using
> elog seems reasonable. Also, I find keeping it consistent with the
> existing error message is also reasonable. We can change both later
> together if we get a broader agreement.

I kept current style. elog() was used here because I regarded it as
"cannot happen" error. According to the doc [1], elog() is still used
for the purpose.

> > 4. The following comment seems frivolous, the code tells it all.
> > Please remove the comment.
> > +
> > +                /* No need to check this slot, seek to new one */
> > +                continue;

Removed.

> > 5. A typo - s/gets/Gets
> > + * gets the LogicalSlotInfos for all the logical replication slots of the

Replaced.

> > 6. An optimization in count_old_cluster_logical_slots(void): Turn
> > slot_count to a function static variable so that the for loop isn't
> > required every time because the slot count is prepared in
> > get_old_cluster_logical_slot_infos only once and won't change later
> > on. Do you see any problem with the following? This saves a few CPU
> > cycles when there are large number of replication slots.
> > {
> >     static int slot_count = 0;
> >     static bool first_time = true;
> >
> >     if (first_time)
> >     {
> >         for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> >             slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> >
> >         first_time = false;
> >     }
> >
> >     return slot_count;
> > }
> >
> 
> This may not be a problem but this is also not a function that will be
> used frequently. I am not sure if adding such code optimizations is
> worth it.

Not addressed.

> > 7. A typo: s/slotname/slot name. "slot name" looks better in user
> > visible messages.
> > +        pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\",
> two_phase: %s",
> >
> 
> If we want to follow other parameters then we can even use slot_name.

Changed to slot_name.

Below part is replies for remained comments:

>8.
>+else
>+{
>+    test_upgrade_from_pre_PG17($old_publisher, $new_publisher,
>+        @pg_upgrade_cmd);
>+}
>Will this ever be tested in current TAP test framework? I mean, will
>the TAP test framework allow testing upgrades from one PG version to
>another PG version?

Yes, the TAP tester allow to do cross-version upgrade. According to
src/bin/pg_upgrade/TESTING file:

```
Testing an upgrade from a different PG version is also possible, and
provides a more thorough test that pg_upgrade does what it's meant for.
```

Below commands are an example of the test.

```
# test PG9.5 -> patched HEAD
$ oldinstall=/home/hayato/older/pg95 make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/003_upgrade_logical_replication_slots.pl .. ok   
All tests successful.
Files=1, Tests=3, 11 wallclock secs ( 0.03 usr  0.01 sys +  2.78 cusr  1.08 csys =  3.90 CPU)
Result: PASS

# grep the output and find an evidence that cross-version check was done
$ cat tmp_check/log/regress_log_003_upgrade_logical_replication_slots | grep 'check the slot does not exist on new
cluster'
[05:14:22.322](0.139s) ok 3 - check the slot does not exist on new cluster

```

>9. A nit: Can single quotes around variable names in the comments be
>removed just to be consistent?
>+     * We also skip decoding in 'fast_forward' mode. This check must be last
>+    /* Do we need to process any change in 'fast_forward' mode? */

Removed.

Also, based on a comment [2], the upgrade function was renamed to 
'binary_upgrade_logical_slot_has_caught_up'.

[1]: https://www.postgresql.org/docs/devel/error-message-reporting.html
[2]: https://www.postgresql.org/message-id/CAA4eK1%2BYZP3j1H4ChhzSR23k6MPryW-cgGstyvqbek2CMJoHRA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v57-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

24 октября 2023 г., 10:50:02

On Tue, Oct 24, 2023 at 11:32 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > If we don't want to keep it generic then we should use something like
> > 'contains_decodable_change'. 'is_change_decodable' could have suited
> > here if we were checking a particular change.
>
> I kept the name for now. How does Bharath think?

No more bikeshedding from my side. +1 for processing_required as-is.

> > > 6. An optimization in count_old_cluster_logical_slots(void): Turn
> > > slot_count to a function static variable so that the for loop isn't
> > > required every time because the slot count is prepared in
> > > get_old_cluster_logical_slot_infos only once and won't change later
> > > on. Do you see any problem with the following? This saves a few CPU
> > > cycles when there are large number of replication slots.
> > > {
> > >     static int slot_count = 0;
> > >     static bool first_time = true;
> > >
> > >     if (first_time)
> > >     {
> > >         for (int dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> > >             slot_count += old_cluster.dbarr.dbs[dbnum].slot_arr.nslots;
> > >
> > >         first_time = false;
> > >     }
> > >
> > >     return slot_count;
> > > }
> > >
> >
> > This may not be a problem but this is also not a function that will be
> > used frequently. I am not sure if adding such code optimizations is
> > worth it.
>
> Not addressed.

count_old_cluster_logical_slots is being called 3 times during
pg_upgrade and every time counting number of slots for all the
databases seems redundant IMV especially given the fact that the slot
count is computed once at the beginning and never changes. When the
replication slots on the cluster are on the higher side, every time
counting *may* prove costly. And, the use of static variables isn't a
huge change requiring a different set of infra or as such, it's a
simple pattern.

Having said above, if others don't see a merit in it, I'm okay to
withdraw my comment.

> Below commands are an example of the test.
>
> ```
> # test PG9.5 -> patched HEAD
> $ oldinstall=/home/hayato/older/pg95 make check PROVE_TESTS='t/003_upgrade_logical_replication_slots.pl'

Oh, I get it. Thanks.

> Also, based on a comment [2], the upgrade function was renamed to
> 'binary_upgrade_logical_slot_has_caught_up'.

+1.

I spent some time on the v57 patch and it looks good to me - tests are
passing, no complaints from pgindent and pgperltidy. I turned the CF
entry https://commitfest.postgresql.org/45/4273/ to RfC.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

25 октября 2023 г., 09:09:07

On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> I spent some time on the v57 patch and it looks good to me - tests are
> passing, no complaints from pgindent and pgperltidy. I turned the CF
> entry https://commitfest.postgresql.org/45/4273/ to RfC.
>

Thanks, the patch looks mostly good to me but I am not convinced of
keeping the tests across versions in this form. I don't think they are
tested in BF, only one can manually create a setup to test. Shall we
remove it for now and then consider it separately?

Apart from that, I have made minor modifications in the docs to adjust
the order of various prerequisites.

--
With Regards,
Amit Kapila.

Вложения

v58-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

25 октября 2023 г., 11:05:08

Dear Amit,

Based on your advice, I revised the patch again. 

> >
> > I spent some time on the v57 patch and it looks good to me - tests are
> > passing, no complaints from pgindent and pgperltidy. I turned the CF
> > entry https://commitfest.postgresql.org/45/4273/ to RfC.
> >
> 
> Thanks, the patch looks mostly good to me but I am not convinced of
> keeping the tests across versions in this form. I don't think they are
> tested in BF, only one can manually create a setup to test.

I analyzed and agreed that current BF client does not use TAP test framework
for cross-version checks.

> Shall we
> remove it for now and then consider it separately?

OK, some parts for cross-checks were removed.

> Apart from that, I have made minor modifications in the docs to adjust
> the order of various prerequisites.

Thanks, included.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

v59-0001-pg_upgrade-Allow-to-replicate-logical-replicatio.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

25 октября 2023 г., 11:09:36

On Wed, Oct 25, 2023 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> >
> > I spent some time on the v57 patch and it looks good to me - tests are
> > passing, no complaints from pgindent and pgperltidy. I turned the CF
> > entry https://commitfest.postgresql.org/45/4273/ to RfC.
> >
>
> Thanks, the patch looks mostly good to me but I am not convinced of
> keeping the tests across versions in this form. I don't think they are
> tested in BF, only one can manually create a setup to test. Shall we
> remove it for now and then consider it separately?

I think we can retain the test_upgrade_from_pre_PG17 because it is not
only possible to trigger it manually but also one can write a CI
workflow to trigger it.

> Apart from that, I have made minor modifications in the docs to adjust
> the order of various prerequisites.

+    <para>
+     <application>pg_upgrade</application> attempts to migrate logical
+     replication slots. This helps avoid the need for manually defining the
+     same replication slots on the new publisher. Migration of logical
+     replication slots is only supported when the old cluster is version 17.0
+     or later. Logical replication slots on clusters before version 17.0 will
+     silently be ignored.
+    </para>

+       The new cluster must not have permanent logical replication slots, i.e.,

How about using "logical slots" in place of "logical replication
slots" to be more generic? We agreed and changed the function name to

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

25 октября 2023 г., 11:19:58

On Wed, Oct 25, 2023 at 1:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Oct 25, 2023 at 11:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Oct 24, 2023 at 1:20 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > >
> > > I spent some time on the v57 patch and it looks good to me - tests are
> > > passing, no complaints from pgindent and pgperltidy. I turned the CF
> > > entry https://commitfest.postgresql.org/45/4273/ to RfC.
> > >
> >
> > Thanks, the patch looks mostly good to me but I am not convinced of
> > keeping the tests across versions in this form. I don't think they are
> > tested in BF, only one can manually create a setup to test. Shall we
> > remove it for now and then consider it separately?
>
> I think we can retain the test_upgrade_from_pre_PG17 because it is not
> only possible to trigger it manually but also one can write a CI
> workflow to trigger it.
>

It would be better to gauge its value separately and add it once the
main patch is committed. I am slightly unhappy even with the hack used
for pre-version testing in previous patch which is as follows:
+# XXX: Older PG version had different rules for the inter-dependency of
+# 'max_wal_senders' and 'max_connections', so assign values which will work for
+# all PG versions. If Cluster.pm is fixed this code is not needed.
+$old_publisher->append_conf(
+ 'postgresql.conf', qq[
+max_wal_senders = 5
+max_connections = 10
+]);

There should be a way to avoid this but we can decide it afterwards. I
don't want to hold the main patch for this point. What do you think?

> > Apart from that, I have made minor modifications in the docs to adjust
> > the order of various prerequisites.
>
> +    <para>
> +     <application>pg_upgrade</application> attempts to migrate logical
> +     replication slots. This helps avoid the need for manually defining the
> +     same replication slots on the new publisher. Migration of logical
> +     replication slots is only supported when the old cluster is version 17.0
> +     or later. Logical replication slots on clusters before version 17.0 will
> +     silently be ignored.
> +    </para>
>
> +       The new cluster must not have permanent logical replication slots, i.e.,
>
> How about using "logical slots" in place of "logical replication
> slots" to be more generic? We agreed and changed the function name to
>

Yeah, I am fine with that and I can take care of it before committing
unless there is more to change.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

25 октября 2023 г., 11:48:06

On Wed, Oct 25, 2023 at 1:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> It would be better to gauge its value separately and add it once the
> main patch is committed.
> There should be a way to avoid this but we can decide it afterwards. I
> don't want to hold the main patch for this point. What do you think?

+1 to go with the main patch first. We also have another thing to take
care of - pg_upgrade option to not migrate logical slots.

> > How about using "logical slots" in place of "logical replication
> > slots" to be more generic? We agreed and changed the function name to
> >
>
> Yeah, I am fine with that and I can take care of it before committing
> unless there is more to change.

+1. I have no other comments.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

26 октября 2023 г., 17:41:03

Hi,

The BF animal fairywren[1] failed when testing
003_upgrade_logical_replication_slots.pl.

From the log, I can see pg_upgrade failed to open the
invalid_logical_replication_slots.txt:

# Checking for valid logical replication slots                  
# could not open file
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
Nosuch file or directory
 
# Failure, exiting

The reason could be the length of this path(262) exceed the windows path
limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
reducing the path somehow.

In this case, I think one approach is to reduce the file and testname to
xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
and share fix soon.

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54

Best Regards,
Hou zj

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

26 октября 2023 г., 18:26:06

On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> The BF animal fairywren[1] failed when testing
> 003_upgrade_logical_replication_slots.pl.
>
> From the log, I can see pg_upgrade failed to open the
> invalid_logical_replication_slots.txt:
>
> # Checking for valid logical replication slots
> # could not open file
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
Nosuch file or directory 
> # Failure, exiting
>
> The reason could be the length of this path(262) exceed the windows path
> limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
> reducing the path somehow.

Nice catch. Windows docs say that the file/directory path name can't
exceed MAX_PATH, which is defined as 260 characters. However, one must
opt-in to enable longer path names -
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
and
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.

> In this case, I think one approach is to reduce the file and testname to
> xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
> and share fix soon.
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54

+1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
In fact, we've used "logical slots" instead of "logical replication
slots" in the docs to be generic. By looking at the generated
directory path name, I think we can use shorter node names - instead
of old_publisher, new_publisher, subscriber - either use node1 (for
old publisher), node2 (for subscriber), node3 (for new publisher) or
use alpha (for old publisher), bravo (for subscriber), charlie (for
new publisher) or such shorter names. We don't have to be that
descriptive and long in node names, one can look at the test file to
know which one is what.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

27 октября 2023 г., 00:57:57

On Fri, Oct 27, 2023 at 2:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > The BF animal fairywren[1] failed when testing
> > 003_upgrade_logical_replication_slots.pl.
> >
> > From the log, I can see pg_upgrade failed to open the
> > invalid_logical_replication_slots.txt:
> >
> > # Checking for valid logical replication slots
> > # could not open file
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
Nosuch file or directory 
> > # Failure, exiting
> >
> > The reason could be the length of this path(262) exceed the windows path
> > limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
> > reducing the path somehow.
>
> Nice catch. Windows docs say that the file/directory path name can't
> exceed MAX_PATH, which is defined as 260 characters. However, one must
> opt-in to enable longer path names -
> https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
> and
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.
>
> > In this case, I think one approach is to reduce the file and testname to
> > xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
> > and share fix soon.
> >
> > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54
>
> +1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
> In fact, we've used "logical slots" instead of "logical replication
> slots" in the docs to be generic. By looking at the generated
> directory path name, I think we can use shorter node names - instead
> of old_publisher, new_publisher, subscriber - either use node1 (for
> old publisher), node2 (for subscriber), node3 (for new publisher) or
> use alpha (for old publisher), bravo (for subscriber), charlie (for
> new publisher) or such shorter names. We don't have to be that
> descriptive and long in node names, one can look at the test file to
> know which one is what.
>

Some more ideas for shortening the filename:

1. "003_upgrade_logical_replication_slots.pl" -- IMO the word
"upgrade" is redundant in that filename (earlier patches never had
this). The test file lives under "pg_upgrade/t" so I felt that
upgrading is already implied.

2. If the node names will be shortened they should still retain *some*
meaning if possible:
old_publisher/subscriber/new_publisher --> node1/node2/node3 (means
nothing without studying the tests)
old_publisher/subscriber/new_publisher --> alpha/bravo/charlie (means
nothing without studying the tests)
How about:
old_publisher/subscriber/new_publisher --> node_p1/node_s/node_p2
or similar...

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

27 октября 2023 г., 05:36:36

On Fri, Oct 27, 2023 at 3:28 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Oct 27, 2023 at 2:26 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Oct 26, 2023 at 8:11 PM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > The BF animal fairywren[1] failed when testing
> > > 003_upgrade_logical_replication_slots.pl.
> > >
> > > From the log, I can see pg_upgrade failed to open the
> > > invalid_logical_replication_slots.txt:
> > >
> > > # Checking for valid logical replication slots
> > > # could not open file
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_replication_slots/data/t_003_upgrade_logical_replication_slots_new_publisher_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_replication_slots.txt":
Nosuch file or directory 
> > > # Failure, exiting
> > >
> > > The reason could be the length of this path(262) exceed the windows path
> > > limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
> > > reducing the path somehow.
> >
> > Nice catch. Windows docs say that the file/directory path name can't
> > exceed MAX_PATH, which is defined as 260 characters. However, one must
> > opt-in to enable longer path names -
> > https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry
> > and
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later.
> >
> > > In this case, I think one approach is to reduce the file and testname to
> > > xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze more
> > > and share fix soon.
> > >
> > > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-10-26%2009%3A04%3A54
> >
> > +1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> > and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.

+1. The proposed file name sounds reasonable.

> > In fact, we've used "logical slots" instead of "logical replication
> > slots" in the docs to be generic. By looking at the generated
> > directory path name, I think we can use shorter node names - instead
> > of old_publisher, new_publisher, subscriber - either use node1 (for
> > old publisher), node2 (for subscriber), node3 (for new publisher) or
> > use alpha (for old publisher), bravo (for subscriber), charlie (for
> > new publisher) or such shorter names. We don't have to be that
> > descriptive and long in node names, one can look at the test file to
> > know which one is what.
> >
>
> Some more ideas for shortening the filename:
>
> 1. "003_upgrade_logical_replication_slots.pl" -- IMO the word
> "upgrade" is redundant in that filename (earlier patches never had
> this). The test file lives under "pg_upgrade/t" so I felt that
> upgrading is already implied.
>

Agreed. So, how about 003_upgrade_logical_slots.pl or simply
003_upgrade_slots.pl?

> 2. If the node names will be shortened they should still retain *some*
> meaning if possible:
> old_publisher/subscriber/new_publisher --> node1/node2/node3 (means
> nothing without studying the tests)
> old_publisher/subscriber/new_publisher --> alpha/bravo/charlie (means
> nothing without studying the tests)
> How about:
> old_publisher/subscriber/new_publisher --> node_p1/node_s/node_p2
> or similar...
>

Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

27 октября 2023 г., 06:07:35

On Fri, Oct 27, 2023 at 8:06 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > +1 for s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> > > and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
>
> +1. The proposed file name sounds reasonable.
>
> Agreed. So, how about 003_upgrade_logical_slots.pl or simply
> 003_upgrade_slots.pl?
>
> Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?

+1 for invalid_logical_slots.txt, 003_upgrade_logical_slots.pl and
oldpub/sub/newpub. With these changes, the path name is brought down
to ~220 chars. These names look good to me iff other things in the
path name aren't dynamic crossing MAX_PATH limit (260 chars).


C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_upgrade_logical_slots/data/t_003_upgrade_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_slots.txt

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

27 октября 2023 г., 07:40:43

Dear Hou,

> The BF animal fairywren[1] failed when testing
> 003_upgrade_logical_replication_slots.pl.

Good catch!

> 
> The reason could be the length of this path(262) exceed the windows path
> limit(260 IIRC). If so, I recall we fixed similar things before (e213de8e7) by
> reducing the path somehow.

Yeah, Bharath has already reported, I agreed that the reason was [1]. 

```
In the Windows API (with some exceptions discussed in the following paragraphs),
the maximum length for a path is MAX_PATH, which is defined as 260 characters.
```

> In this case, I think one approach is to reduce the file and testname to
> xxx_logical_slots instead of xxx_logical_replication_slots. But we will analyze
> more
> and share fix soon.
>

Here is a patch for fixing to 003_logical_slots. Also, I got a comment off list so that it was included.

```
-# Setup a pg_upgrade command. This will be used anywhere.
+# Setup a common pg_upgrade command to be used by all the test cases
```

[1]: https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

0001-Shorten-some-files.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

27 октября 2023 г., 07:41:35

Dear Bharath, Amit, Peter,

Thank you for discussing! A patch can be available in [1].

> > > > +1 for
> s/003_upgrade_logical_replication_slots.pl/003_upgrade_logical_slots.pl
> > > > and s/invalid_logical_replication_slots.txt/invalid_logical_slots.txt.
> >
> > +1. The proposed file name sounds reasonable.
> >
> > Agreed. So, how about 003_upgrade_logical_slots.pl or simply
> > 003_upgrade_slots.pl?
> >
> > Why not simply oldpub/sub/newpub or old_pub/sub/new_pub?
> 
> +1 for invalid_logical_slots.txt, 003_upgrade_logical_slots.pl and
> oldpub/sub/newpub. With these changes, the path name is brought down
> to ~220 chars. These names look good to me iff other things in the
> path name aren't dynamic crossing MAX_PATH limit (260 chars).
> 
> C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgra
> de/003_upgrade_logical_slots/data/t_003_upgrade_logical_slots_newpub_data/
> pgdata/pg_upgrade_output.d/20231026T112558.309/invalid_logical_slots.txt

Replaced to invalid_logical_slots.txt, 003_logical_slots.pl, and oldpub/sub/newpub.
Regarding the test finename, some client app (e.g., pg_ctl) does not have a prefix,
and some others (e.g., pg_dump) have. Either way seems acceptable.
Hence I chose to remove the header.

```
$ ls pg_ctl/t/
001_start_stop.pl  002_status.pl  003_promote.pl  004_logrotate.pl

$ ls pg_dump/t/
001_basic.pl  002_pg_dump.pl  003_pg_dump_with_server.pl  004_pg_dump_parallel.pl  010_dump_connstr.pl
```

[1]:
https://www.postgresql.org/message-id/TYCPR01MB5870A6A8FBB23554EDE8F5F3F5DCA%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Michael Paquier

Дата:

27 октября 2023 г., 08:13:49

On Fri, Oct 27, 2023 at 04:40:43AM +0000, Hayato Kuroda (Fujitsu) wrote:
> Yeah, Bharath has already reported, I agreed that the reason was [1].
>
> ```
> In the Windows API (with some exceptions discussed in the following paragraphs),
> the maximum length for a path is MAX_PATH, which is defined as 260 characters.
> ```

-                        "invalid_logical_replication_slots.txt");
+                        "invalid_logical_slots.txt");

Or you could do something even shorter, with "invalid_slots.txt".
--
Michael

Вложения

signature.asc

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

27 октября 2023 г., 08:39:01

On Fri, Oct 27, 2023 at 10:43 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Oct 27, 2023 at 04:40:43AM +0000, Hayato Kuroda (Fujitsu) wrote:
> > Yeah, Bharath has already reported, I agreed that the reason was [1].
> >
> > ```
> > In the Windows API (with some exceptions discussed in the following paragraphs),
> > the maximum length for a path is MAX_PATH, which is defined as 260 characters.
> > ```
>
> -                        "invalid_logical_replication_slots.txt");
> +                        "invalid_logical_slots.txt");
>
> Or you could do something even shorter, with "invalid_slots.txt".
>

I also thought of it but if we want to keep it that way, we should
slightly adjust the messages like: "The slot \"%s\" is invalid" to
include slot_type. This will contain only logical slots, so the
current one probably seems okay.


--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

27 октября 2023 г., 08:49:21

Dear Michael,

> Or you could do something even shorter, with "invalid_slots.txt".

I think current one seems better, because we only support logical replication
slots for now. We can extend as you said when we support physical slot as well.
Also, proposed length is sufficient for fairywren [1].

[1]: https://www.postgresql.org/message-id/CALj2ACVc-WSx_fvfynt-G3j8rjhNTMZ8DHu2wiKgCEiV9EO86g%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

27 октября 2023 г., 08:50:17

On Fri, Oct 27, 2023 at 11:09 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 27, 2023 at 10:43 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > -                        "invalid_logical_replication_slots.txt");
> > +                        "invalid_logical_slots.txt");
> >
> > Or you could do something even shorter, with "invalid_slots.txt".
> >
>
> I also thought of it but if we want to keep it that way, we should
> slightly adjust the messages like: "The slot \"%s\" is invalid" to
> include slot_type. This will contain only logical slots, so the
> current one probably seems okay.

+1 for invalid_logical_slots.txt as file name (which can fix Windows
path name issue) and contents as-is "The slot \"%s\" is invalid\n" and
"The slot \"%s\" has not consumed the WAL yet\n".

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

27 октября 2023 г., 08:53:00

On Fri, Oct 27, 2023 at 10:10 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Here is a patch for fixing to 003_logical_slots. Also, I got a comment off list so that it was included.
>
> ```
> -# Setup a pg_upgrade command. This will be used anywhere.
> +# Setup a common pg_upgrade command to be used by all the test cases
> ```

The patch LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

27 октября 2023 г., 08:57:19

On Fri, Oct 27, 2023 at 11:24 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Oct 27, 2023 at 10:10 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Here is a patch for fixing to 003_logical_slots. Also, I got a comment off list so that it was included.
> >
> > ```
> > -# Setup a pg_upgrade command. This will be used anywhere.
> > +# Setup a common pg_upgrade command to be used by all the test cases
> > ```
>
> The patch LGTM.
>

Thanks, I'll push it in some time.

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

27 октября 2023 г., 10:44:15

Dear Amit,

I found several machines on BF got angry (e.g. [1]), because of missing update meson.build. Sorry for that.
PSA the patch to fix it.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2023-10-27%2006%3A08%3A31

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

fix_meson.patch

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

07 ноября 2023 г., 07:14:25

Dear hackers,

PSA the patch to solve the issue [1].

Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
generated in the source directory, even when the VPATH/meson build.
This can avoid by changing the directory explicitly.

[1]:
https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

change_dir.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Peter Smith

Дата:

07 ноября 2023 г., 07:23:28

On Tue, Nov 7, 2023 at 3:14 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> PSA the patch to solve the issue [1].
>
> Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> generated in the source directory, even when the VPATH/meson build.
> This can avoid by changing the directory explicitly.
>

Hi Kuroda-san,

Thanks for the patch.

I reproduced the bug, then after applying your patch, I confirmed the
problem is fixed. I used the VPATH build

~~~

BEFORE
t/001_basic.pl .......... ok
t/002_pg_upgrade.pl ..... ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.05 usr  0.01 sys + 12.90
cusr  7.43 csys = 20.39 CPU)
Result: PASS

OBSERVE THE BUG
Look in the source folder and notice the file that should not be there.

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
delete_old_cluster.sh

~~~

AFTER
# +++ tap check in src/bin/pg_upgrade +++
t/001_basic.pl .......... ok
t/002_pg_upgrade.pl ..... ok
t/003_logical_slots.pl .. ok
All tests successful.
Files=3, Tests=39, 128 wallclock secs ( 0.06 usr  0.01 sys + 13.02
cusr  7.28 csys = 20.37 CPU)
Result: PASS

CONFIRM THE FIX
Check the offending file is no longer in the src folder

[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/oss_postgres_misc/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls *.sh
ls: cannot access *.sh: No such file or directory

Instead, it is found in the VPATH folder
[postgres@CentOS7-x64 pg_upgrade]$ pwd
/home/postgres/vpath_dir/src/bin/pg_upgrade
[postgres@CentOS7-x64 pg_upgrade]$ ls tmp_check/
delete_old_cluster.sh  log  results

======
Kind Regards,
Peter Smith.
Fujitsu Australia

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Zhijie Hou (Fujitsu)"

Дата:

07 ноября 2023 г., 07:30:59

On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> 
> Dear hackers,
> 
> PSA the patch to solve the issue [1].
> 
> Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> generated in the source directory, even when the VPATH/meson build.
> This can avoid by changing the directory explicitly.
> 
> [1]:
> https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf

Thanks for the patch, I have confirmed that the files won't be generated
in source directory after applying the patch.

After running: "meson test -C build/ --suite pg_upgrade",
The files are in the test directory:
./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh

Best regards,
Hou zj

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 ноября 2023 г., 10:55:33

On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear hackers,
> >
> > PSA the patch to solve the issue [1].
> >
> > Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> > generated in the source directory, even when the VPATH/meson build.
> > This can avoid by changing the directory explicitly.
> >
> > [1]:
> > https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> > 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf
>
> Thanks for the patch, I have confirmed that the files won't be generated
> in source directory after applying the patch.
>
> After running: "meson test -C build/ --suite pg_upgrade",
> The files are in the test directory:
> ./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh
>

Thanks for the patch and verification. Pushed the fix.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

08 ноября 2023 г., 06:13:50

On Tue, 7 Nov 2023 at 13:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Dear hackers,
> > >
> > > PSA the patch to solve the issue [1].
> > >
> > > Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> > > generated in the source directory, even when the VPATH/meson build.
> > > This can avoid by changing the directory explicitly.
> > >
> > > [1]:
> > > https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> > > 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf
> >
> > Thanks for the patch, I have confirmed that the files won't be generated
> > in source directory after applying the patch.
> >
> > After running: "meson test -C build/ --suite pg_upgrade",
> > The files are in the test directory:
> > ./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh
> >
>
> Thanks for the patch and verification. Pushed the fix.

While verifying upgrade of subscriber patch, I found one issue with
upgrade in verbose mode.
I was able to reproduce this issue by performing a upgrade with a
verbose option.

The trace for the same is given below:
Program received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
126        ../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or directory.
(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
#1  0x000055555556f572 in dopr (target=0x7fffffffbb90,
format=0x55555557859e "\", plugin: \"%s\", two_phase: %s",
args=0x7fffffffdc40) at snprintf.c:444
#2  0x000055555556ed95 in pg_vsnprintf (str=0x7fffffffbc10 "slot_name:
\"ication slots within the database:", count=8192, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
    args=0x7fffffffdc40) at snprintf.c:195
#3  0x00005555555667e3 in pg_log_v (type=PG_VERBOSE,
fmt=0x555555578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
ap=0x7fffffffdc40) at util.c:184
#4  0x0000555555566b38 in pg_log (type=PG_VERBOSE, fmt=0x555555578590
"slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
#5  0x0000555555561a06 in print_slot_infos (slot_arr=0x555555595ed0)
at info.c:813
#6  0x000055555556186e in print_db_infos (db_arr=0x555555587518
<new_cluster+120>) at info.c:782
#7  0x00005555555606da in get_db_rel_and_slot_infos
(cluster=0x5555555874a0 <new_cluster>, live_check=false) at info.c:308
#8  0x000055555555839a in check_new_cluster () at check.c:215
#9  0x0000555555563010 in main (argc=13, argv=0x7fffffffdf08) at
pg_upgrade.c:136

This issue occurs because we are accessing uninitialized slot array information.

We could fix it by a couple of ways: a) Initialize the whole of
dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
that the slot information is set to 0. b) Setting only slot
information. Attached patch has the changes for both the approaches.
Thoughts?

Regards,
Vignesh

On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 7 Nov 2023 at 13:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Nov 7, 2023 at 10:01 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Tuesday, November 7, 2023 12:14 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> > > >
> > > > Dear hackers,
> > > >
> > > > PSA the patch to solve the issue [1].
> > > >
> > > > Kindly Peter E. and Andrew raised an issue that delete_old_cluster.sh is
> > > > generated in the source directory, even when the VPATH/meson build.
> > > > This can avoid by changing the directory explicitly.
> > > >
> > > > [1]:
> > > > https://www.postgresql.org/message-id/flat/7b8a9460-5668-b372-04e6-7b
> > > > 52e9308493%40dunslane.net#554090099bbbd12c94bf570665a6badf
> > >
> > > Thanks for the patch, I have confirmed that the files won't be generated
> > > in source directory after applying the patch.
> > >
> > > After running: "meson test -C build/ --suite pg_upgrade",
> > > The files are in the test directory:
> > > ./build/testrun/pg_upgrade/003_logical_slots/data/delete_old_cluster.sh
> > >
> >
> > Thanks for the patch and verification. Pushed the fix.
>
> While verifying upgrade of subscriber patch, I found one issue with
> upgrade in verbose mode.
> I was able to reproduce this issue by performing a upgrade with a
> verbose option.
>
> The trace for the same is given below:
> Program received signal SIGSEGV, Segmentation fault.
> __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> 126        ../sysdeps/x86_64/multiarch/strlen-vec.S: No such file or directory.
> (gdb) bt
> #0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/strlen-vec.S:126
> #1  0x000055555556f572 in dopr (target=0x7fffffffbb90,
> format=0x55555557859e "\", plugin: \"%s\", two_phase: %s",
> args=0x7fffffffdc40) at snprintf.c:444
> #2  0x000055555556ed95 in pg_vsnprintf (str=0x7fffffffbc10 "slot_name:
> \"ication slots within the database:", count=8192, fmt=0x555555578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
>     args=0x7fffffffdc40) at snprintf.c:195
> #3  0x00005555555667e3 in pg_log_v (type=PG_VERBOSE,
> fmt=0x555555578590 "slot_name: \"%s\", plugin: \"%s\", two_phase: %s",
> ap=0x7fffffffdc40) at util.c:184
> #4  0x0000555555566b38 in pg_log (type=PG_VERBOSE, fmt=0x555555578590
> "slot_name: \"%s\", plugin: \"%s\", two_phase: %s") at util.c:264
> #5  0x0000555555561a06 in print_slot_infos (slot_arr=0x555555595ed0)
> at info.c:813
> #6  0x000055555556186e in print_db_infos (db_arr=0x555555587518
> <new_cluster+120>) at info.c:782
> #7  0x00005555555606da in get_db_rel_and_slot_infos
> (cluster=0x5555555874a0 <new_cluster>, live_check=false) at info.c:308
> #8  0x000055555555839a in check_new_cluster () at check.c:215
> #9  0x0000555555563010 in main (argc=13, argv=0x7fffffffdf08) at
> pg_upgrade.c:136
>
> This issue occurs because we are accessing uninitialized slot array information.
>
> We could fix it by a couple of ways: a) Initialize the whole of
> dbinfos by using pg_malloc0 instead of pg_malloc which will ensure
> that the slot information is set to 0. b) Setting only slot
> information. Attached patch has the changes for both the approaches.
> Thoughts?

Here is a small improvisation where num_slots need not be initialized
as it will be used only after assigning the result now. The attached
patch has the changes for the same.

Regards,
Vignesh

Вложения

Upgrade_verbose_issue_fix_v2.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

09 ноября 2023 г., 13:06:56

On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:
>
> Here is a small improvisation where num_slots need not be initialized
> as it will be used only after assigning the result now. The attached
> patch has the changes for the same.
>

Pushed!

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

John Naylor

Дата:

22 ноября 2023 г., 10:59:59

On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:
> >
> > Here is a small improvisation where num_slots need not be initialized
> > as it will be used only after assigning the result now. The attached
> > patch has the changes for the same.
> >
>
> Pushed!

Hi all, the CF entry for this is marked RfC, and CI is trying to apply
the last patch committed. Is there further work that needs to be
re-attached and/or rebased?

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

22 ноября 2023 г., 11:47:40

On Wed, Nov 22, 2023 at 1:30 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Thu, Nov 9, 2023 at 5:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > Here is a small improvisation where num_slots need not be initialized
> > > as it will be used only after assigning the result now. The attached
> > > patch has the changes for the same.
> > >
> >
> > Pushed!
>
> Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> the last patch committed. Is there further work that needs to be
> re-attached and/or rebased?
>

No. I have marked it as committed.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

28 ноября 2023 г., 08:35:23

On Thu, Nov 9, 2023 at 7:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 8, 2023 at 11:05 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 8 Nov 2023 at 08:43, vignesh C <vignesh21@gmail.com> wrote:
> >
> > Here is a small improvisation where num_slots need not be initialized
> > as it will be used only after assigning the result now. The attached
> > patch has the changes for the same.
> >
>
> Pushed!
>

Thank you for your work on this feature!

One month has already been passed since this main patch got committed
but reading this change, I have some questions on new
binary_upgrade_logical_slot_has_caught_up() function:

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to  check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

---
Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

---
The following error message doesn't match the function name:

    /* We must check before dereferencing the argument */
    if (PG_ARGISNULL(0))
        elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

---
{ oid => '8046', descr => 'for use by pg_upgrade',
  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
  proargtypes => 'name',
  prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

---
LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Bharath Rupireddy

Дата:

28 ноября 2023 г., 11:02:41

On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> One month has already been passed since this main patch got committed
> but reading this change, I have some questions on new
> binary_upgrade_logical_slot_has_caught_up() function:
>
> Is there any reason why this function can be executed only in binary
> upgrade mode? It seems to me that other functions in
> pg_upgrade_support.c must be called only in binary upgrade mode
> because it does some hacky changes internally. On the other hand,
> binary_upgrade_logical_slot_has_caught_up() just calls
> LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> internally. If we make this function usable in normal mode, the user
> would be able to  check each slot's upgradability without pg_upgrade
> --check command (or without stopping the server if the user can ensure
> no more meaningful WAL records are generated).

It may happen that such a user-facing function tells there's no
unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
Therefore, the information the function gives turns out to be
incorrect. I don't see a real-world use-case for such a function right
now. If there's one, it's not a big change to turn it into a
user-facing function.

> ---
> Also, the function checks if the user has the REPLICATION privilege
> but I think that only superuser can connect to the server in binary
> upgrade mode in the first place.

If that were true, I don't see a problem in having
CheckSlotPermissions() there, in fact it can act as an assertion.

> ---
> The following error message doesn't match the function name:
>
>     /* We must check before dereferencing the argument */
>     if (PG_ARGISNULL(0))
>         elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");
>
> ---
> { oid => '8046', descr => 'for use by pg_upgrade',
>   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
>   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
>   proargtypes => 'name',
>   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
>
> The function is not a strict function but we check in the function if
> the passed argument is not null. I think it would be clearer to make
> it a strict function.

I think it has been done that way similar to
binary_upgrade_create_empty_extension().

> ---
> LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> guess it's more suitable to be in slotfunc.s where similar functions
> such as pg_logical_replication_slot_advance() is also defined.

Why not in logicalfuncs.c?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

28 ноября 2023 г., 12:50:25

On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > One month has already been passed since this main patch got committed
> > but reading this change, I have some questions on new
> > binary_upgrade_logical_slot_has_caught_up() function:
> >
> > Is there any reason why this function can be executed only in binary
> > upgrade mode? It seems to me that other functions in
> > pg_upgrade_support.c must be called only in binary upgrade mode
> > because it does some hacky changes internally. On the other hand,
> > binary_upgrade_logical_slot_has_caught_up() just calls
> > LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> > internally. If we make this function usable in normal mode, the user
> > would be able to  check each slot's upgradability without pg_upgrade
> > --check command (or without stopping the server if the user can ensure
> > no more meaningful WAL records are generated).
>
> It may happen that such a user-facing function tells there's no
> unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
> Therefore, the information the function gives turns out to be
> incorrect. I don't see a real-world use-case for such a function right
> now. If there's one, it's not a big change to turn it into a
> user-facing function.
>

Yeah, as of now, I don't see a use case for it and in fact, it could
lead to unpredictable results. Immediately after calling the function,
there could be more activity on the server which could make the
results incorrect. I think to check the slot's upgradeability, one can
rely on the results of the pg_upgrade --check functionality.

> > ---
> > Also, the function checks if the user has the REPLICATION privilege
> > but I think that only superuser can connect to the server in binary
> > upgrade mode in the first place.
>
> If that were true, I don't see a problem in having
> CheckSlotPermissions() there, in fact it can act as an assertion.
>

I think we can change it to assertion or may elog(ERROR, ...) with a
comment as to why we don't expect this can happen.

> > ---
> > The following error message doesn't match the function name:
> >
> >     /* We must check before dereferencing the argument */
> >     if (PG_ARGISNULL(0))
> >         elog(ERROR, "null argument to
> > binary_upgrade_validate_wal_records is not allowed");
> >

This should be fixed.

> > ---
> > { oid => '8046', descr => 'for use by pg_upgrade',
> >   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
> >   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> >   proargtypes => 'name',
> >   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> >
> > The function is not a strict function but we check in the function if
> > the passed argument is not null. I think it would be clearer to make
> > it a strict function.
>
> I think it has been done that way similar to
> binary_upgrade_create_empty_extension().
>
> > ---
> > LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> > guess it's more suitable to be in slotfunc.s where similar functions
> > such as pg_logical_replication_slot_advance() is also defined.
>
> Why not in logicalfuncs.c?
>

I am not sure if either of those is better than logical.c. IIRC, I
thought it was okay to keep in logical.c as others primarily deal with
exposed SQL functions and I felt it somewhat matches with the intent
of logical.c ("The goal is to encapsulate most of the internal
complexity for consumers of logical decoding, so they can create and
consume a changestream with a low amount of code..").

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

28 ноября 2023 г., 13:04:38

Dear Bharath, Sawada-san,

Welcome back!

> >
> > ---
> > { oid => '8046', descr => 'for use by pg_upgrade',
> >   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
> >   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> >   proargtypes => 'name',
> >   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> >
> > The function is not a strict function but we check in the function if
> > the passed argument is not null. I think it would be clearer to make
> > it a strict function.
> 
> I think it has been done that way similar to
> binary_upgrade_create_empty_extension().

Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
un-strict to keep a caller function simpler.

Currently get_old_cluster_logical_slot_infos() executes a query and it contains
binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we assumed
either true or false is returned.
 
But if proisstrict is changed true, we must handle the case when NULL is returned.
It is small but backseat operation.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

28 ноября 2023 г., 15:32:19

On Tue, Nov 28, 2023 at 6:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Nov 28, 2023 at 1:32 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Nov 28, 2023 at 11:06 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > One month has already been passed since this main patch got committed
> > > but reading this change, I have some questions on new
> > > binary_upgrade_logical_slot_has_caught_up() function:
> > >
> > > Is there any reason why this function can be executed only in binary
> > > upgrade mode? It seems to me that other functions in
> > > pg_upgrade_support.c must be called only in binary upgrade mode
> > > because it does some hacky changes internally. On the other hand,
> > > binary_upgrade_logical_slot_has_caught_up() just calls
> > > LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> > > internally. If we make this function usable in normal mode, the user
> > > would be able to  check each slot's upgradability without pg_upgrade
> > > --check command (or without stopping the server if the user can ensure
> > > no more meaningful WAL records are generated).
> >
> > It may happen that such a user-facing function tells there's no
> > unconsumed WAL, but later on the WAL gets generated during pg_upgrade.
> > Therefore, the information the function gives turns out to be
> > incorrect. I don't see a real-world use-case for such a function right
> > now. If there's one, it's not a big change to turn it into a
> > user-facing function.
> >
>
> Yeah, as of now, I don't see a use case for it and in fact, it could
> lead to unpredictable results. Immediately after calling the function,
> there could be more activity on the server which could make the
> results incorrect. I think to check the slot's upgradeability, one can
> rely on the results of the pg_upgrade --check functionality.

Fair point.

This function is already a user-executable function as it's in
pg_catalog but is restricted to be executed only in binary upgrade
even though it doesn't change anything internally. So it wasn't clear
to me why we put such a restriction.

>
> > > ---
> > > Also, the function checks if the user has the REPLICATION privilege
> > > but I think that only superuser can connect to the server in binary
> > > upgrade mode in the first place.
> >
> > If that were true, I don't see a problem in having
> > CheckSlotPermissions() there, in fact it can act as an assertion.
> >
>
> I think we can change it to assertion or may elog(ERROR, ...) with a
> comment as to why we don't expect this can happen.

+1 for an assertion, to match other checks in the function.

>
> > > ---
> > > The following error message doesn't match the function name:
> > >
> > >     /* We must check before dereferencing the argument */
> > >     if (PG_ARGISNULL(0))
> > >         elog(ERROR, "null argument to
> > > binary_upgrade_validate_wal_records is not allowed");
> > >
>
> This should be fixed.
>
> > > ---
> > > { oid => '8046', descr => 'for use by pg_upgrade',
> > >   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
> > >   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
> > >   proargtypes => 'name',
> > >   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> > >
> > > The function is not a strict function but we check in the function if
> > > the passed argument is not null. I think it would be clearer to make
> > > it a strict function.
> >
> > I think it has been done that way similar to
> > binary_upgrade_create_empty_extension().

binary_upgrade_create_empty_extension() needs to be a non-strict
function since it needs to accept NULL in some arguments such as
extConfig. On the other hand,
binary_upgrade_logical_slot_has_caught_up() doesn't handle NULL and
it's conventional to make such a function a strict function.

> >
> > > ---
> > > LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> > > guess it's more suitable to be in slotfunc.s where similar functions
> > > such as pg_logical_replication_slot_advance() is also defined.
> >
> > Why not in logicalfuncs.c?
> >
>
> I am not sure if either of those is better than logical.c. IIRC, I
> thought it was okay to keep in logical.c as others primarily deal with
> exposed SQL functions and I felt it somewhat matches with the intent
> of logical.c ("The goal is to encapsulate most of the internal
> complexity for consumers of logical decoding, so they can create and
> consume a changestream with a low amount of code..").

I see your point. To me it looks that the functions in logical.c are
APIs and internal functions to manage logical decoding context and
replication slot (e.g., restart_lsn). On the other hand,
LogicalReplicationSlotHasPendingWal() seems to be a user of the
logical decoding. But anyway, it seems that three hackers have
different opinions. So we can keep it unless someone has a good reason
to change it.

On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
>
> Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
> un-strict to keep a caller function simpler.
>
> Currently get_old_cluster_logical_slot_infos() executes a query and it contains
> binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we assumed
> either true or false is returned.
>
> But if proisstrict is changed true, we must handle the case when NULL is returned.
> It is small but backseat operation.

Which cases are you concerned pg_upgrade could pass NULL to
binary_upgrade_logical_slot_has_caught_up()?

I've not tested it yet but even if it returns NULL, perhaps
get_old_cluster_logical_slot_infos() would still set curr->caught_up
to false, no?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

28 ноября 2023 г., 16:58:01

Dear Sawada-san,

> On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> >
> > Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
> > un-strict to keep a caller function simpler.
> >
> > Currently get_old_cluster_logical_slot_infos() executes a query and it contains
> > binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we
> assumed
> > either true or false is returned.
> >
> > But if proisstrict is changed true, we must handle the case when NULL is
> returned.
> > It is small but backseat operation.
> 
> Which cases are you concerned pg_upgrade could pass NULL to
> binary_upgrade_logical_slot_has_caught_up()?

Actually, we do not expect that it won't input NULL. IIUC all of slots have
slot_name, and subquery uses its name. But will it be kept forever? I think we
can avoid any risk.

> I've not tested it yet but even if it returns NULL, perhaps
> get_old_cluster_logical_slot_infos() would still set curr->caught_up
> to false, no?

Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
case when the NULL is input. So it depends implementation.

[1]: https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Masahiko Sawada

Дата:

28 ноября 2023 г., 22:30:37

On Tue, Nov 28, 2023 at 10:58 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> > On Tue, Nov 28, 2023 at 7:04 PM Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > >
> > > Yeah, we followed binary_upgrade_create_empty_extension(). Also, we set as
> > > un-strict to keep a caller function simpler.
> > >
> > > Currently get_old_cluster_logical_slot_infos() executes a query and it contains
> > > binary_upgrade_logical_slot_has_caught_up(). In pg_upgrade layer, we
> > assumed
> > > either true or false is returned.
> > >
> > > But if proisstrict is changed true, we must handle the case when NULL is
> > returned.
> > > It is small but backseat operation.
> >
> > Which cases are you concerned pg_upgrade could pass NULL to
> > binary_upgrade_logical_slot_has_caught_up()?
>
> Actually, we do not expect that it won't input NULL. IIUC all of slots have
> slot_name, and subquery uses its name. But will it be kept forever? I think we
> can avoid any risk.
>
> > I've not tested it yet but even if it returns NULL, perhaps
> > get_old_cluster_logical_slot_infos() would still set curr->caught_up
> > to false, no?
>
> Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
> case when the NULL is input. So it depends implementation.

I think PQgetvalue() returns an empty string if the result value is null.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

29 ноября 2023 г., 05:03:00

Dear Sawada-san,

> > Actually, we do not expect that it won't input NULL. IIUC all of slots have
> > slot_name, and subquery uses its name. But will it be kept forever? I think we
> > can avoid any risk.
> >
> > > I've not tested it yet but even if it returns NULL, perhaps
> > > get_old_cluster_logical_slot_infos() would still set curr->caught_up
> > > to false, no?
> >
> > Hmm. I checked the C99 specification [1] of strcmp, but it does not define the
> > case when the NULL is input. So it depends implementation.
> 
> I think PQgetvalue() returns an empty string if the result value is null.
>

Oh, you are right... I found below paragraph from [1].

> An empty string is returned if the field value is null. See PQgetisnull to distinguish
> null values from empty-string values.

So I agree what you said - current code can accept NULL.
But still not sure the error message is really good or not.
If we regard an empty string as false, the slot which has empty name will be reported like:
"The slot \"\" has not consumed the WAL yet" in check_old_cluster_for_valid_slots().
Isn't it inappropriate?

(Note again - currently we do not find such a case, so it may be overkill)

[1]: https://www.postgresql.org/docs/devel/libpq-exec.html#LIBPQ-PQGETVALUE

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

29 ноября 2023 г., 12:26:26

Dear hackers,

> > >
> > > Pushed!
> >
> > Hi all, the CF entry for this is marked RfC, and CI is trying to apply
> > the last patch committed. Is there further work that needs to be
> > re-attached and/or rebased?
> >
> 
> No. I have marked it as committed.
>

I found another failure related with the commit [1]. I think it is caused by the
autovacuum. I want to propose a patch which disables the feature for old publisher.

More detail, please see below.

# Analysis of the failure

Summary: this failure occurs when the autovacuum starts after the subscription
is disabled but before doing pg_upgrade.

According to the regress file, it unexpectedly failed the pg_upgrade [2]. There are
no possibilities for slots are invalidated, so some WALs seemed to be generated
after disabling the subscriber.

Also, server log caused by oldpub said that autovacuum worker was terminated when
it stopped. This was occurred after walsender released the logical slots. WAL records
caused by autovacuum workers could not be consumed by the slots, so that upgrading
function returned false.

# How to reproduce

I made a small file for reproducing the failure. Please see reproduce.txt. This contains
changes for launching autovacuum worker very often and for ensuring actual works are
done. After applying it, I could reproduce the same failure every time.

# How to fix

I think it is sufficient to fix only the test code.
The easiest way is to disable the autovacuum on old publisher. PSA the patch file.

How do you think?


[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-11-27%2020%3A52%3A10
[2]:
```
...
Checking for contrib/isn with bigint-passing mismatch         ok
Checking for valid logical replication slots                  fatal

Your installation contains logical replication slots that can't be upgraded.
You can remove invalid slots and/or consume the pending WAL for other slots,
and then restart the upgrade.
A list of the problematic slots is in the file:

/home/bf/bf-build/skink-master/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231127T220024.480/invalid_logical_slots.txt
Failure, exiting
[22:01:20.362](86.645s) not ok 10 - run of pg_upgrade of old cluster
...
```
[3]:
```
...
2023-11-27 22:00:23.546 UTC [3567962][walsender][4/0:0] LOG:  released logical replication slot "regress_sub"
2023-11-27 22:00:23.549 UTC [3559042][postmaster][:0] LOG:  received fast shutdown request
2023-11-27 22:00:23.552 UTC [3559042][postmaster][:0] LOG:  aborting any active transactions
*2023-11-27 22:00:23.663 UTC [3568793][autovacuum worker][5/3:738] FATAL:  terminating autovacuum process due to
administratorcommand*
 
2023-11-27 22:00:23.775 UTC [3559042][postmaster][:0] LOG:  background worker "logical replication launcher" (PID
3560674)exited with exit code 1
 
...
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Dear hackers,

I found another failure related with the commit [1]. This is caused by missing
wait on the test code. Amit helped me for this analysis and fix.

# Analysis of the failure

The failure is that restored slot is two_phase = false, whereas the slot is
created as two_phase = true. This is because pg_upgrade was executed before all
tables are in ready state.

# How to fix

I think the test is not good. According to other subscription tests related with
2PC, they additionally wait until subtwophasestate becomes 'e'. It should be
added as well. PSA the patch.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2023-12-01%2016%3A59%3A30

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

add_wait.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

05 декабря 2023 г., 08:16:57

On Mon, Dec 4, 2023 at 11:59 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> I found another failure related with the commit [1]. This is caused by missing
> wait on the test code. Amit helped me for this analysis and fix.
>

Pushed!

--
With Regards,
Amit Kapila.

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

05 декабря 2023 г., 08:41:09

Dear Sawada-san, hackers,

Based on comments I made a fix. PSA the patch.

> 
> Is there any reason why this function can be executed only in binary
> upgrade mode? It seems to me that other functions in
> pg_upgrade_support.c must be called only in binary upgrade mode
> because it does some hacky changes internally. On the other hand,
> binary_upgrade_logical_slot_has_caught_up() just calls
> LogicalReplicationSlotHasPendingWal(), which doesn't change anything
> internally. If we make this function usable in normal mode, the user
> would be able to  check each slot's upgradability without pg_upgrade
> --check command (or without stopping the server if the user can ensure
> no more meaningful WAL records are generated).

I kept the function to be upgrade only because subsequent operations might generate
WALs. See [1].

> Also, the function checks if the user has the REPLICATION privilege
> but I think that only superuser can connect to the server in binary
> upgrade mode in the first place.

CheckSlotPermissions() was replaced to Assert().

> The following error message doesn't match the function name:
> 
>     /* We must check before dereferencing the argument */
>     if (PG_ARGISNULL(0))
>         elog(ERROR, "null argument to
> binary_upgrade_validate_wal_records is not allowed");

Per below comment, this elog(ERROR) was not needed anymore. Removed.

> { oid => '8046', descr => 'for use by pg_upgrade',
>   proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
>   provolatile => 'v', proparallel => 'u', prorettype => 'bool',
>   proargtypes => 'name',
>   prosrc => 'binary_upgrade_logical_slot_has_caught_up' },
> 
> The function is not a strict function but we check in the function if
> the passed argument is not null. I think it would be clearer to make
> it a strict function.

Per conclusion [2], I changed the function to the strict one. As shown in below,
binary_upgrade_logical_slot_has_caught_up() returned NULL when the input was NULL.

```
postgres=# SELECT * FROM pg_create_logical_replication_slot('slot', 'test_decoding');
 slot_name |    lsn    
-----------+-----------
 slot      | 0/152E7E0
(1 row)

postgres=# SELECT * FROM binary_upgrade_logical_slot_has_caught_up(NULL);
 binary_upgrade_logical_slot_has_caught_up 
-------------------------------------------
 
(1 row)
```

> LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
> guess it's more suitable to be in slotfunc.s where similar functions
> such as pg_logical_replication_slot_advance() is also defined.

Committers had different opinions about it, so I kept current style [3].

[1]: https://www.postgresql.org/message-id/CALj2ACW7H-kAHia%3DvCbmdWDueGA_3pQfyzARfAQX0aGzHY57Zw%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAA4eK1LzK0NvMkWAY6RJ6yN%2BYYUgMg1f%3DmNOGV8CPXLT43FHMw%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAD21AoDkyyC%3Dwa2%3D1Ruo_L8g16xf_W5Xyhp-%3D3j9urT916b9gA%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вложения

followup_for_upgrade.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

vignesh C

Дата:

06 декабря 2023 г., 07:10:43

On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san, hackers,
>
> Based on comments I made a fix. PSA the patch.
>

Thanks for the patch, the changes look good to me.

Regards,
Vignesh

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

06 декабря 2023 г., 07:32:28

On Wed, Dec 6, 2023 at 9:40 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Sawada-san, hackers,
> >
> > Based on comments I made a fix. PSA the patch.
> >
>
> Thanks for the patch, the changes look good to me.
>

Thanks, I have added a comment and updated the commit message. I'll
push this tomorrow unless there are more comments.

--
With Regards,
Amit Kapila.

Вложения

v2-0001-Fix-issues-in-binary_upgrade_logical_slot_has_cau.patch

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Amit Kapila

Дата:

07 декабря 2023 г., 09:29:00

On Wed, Dec 6, 2023 at 10:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 6, 2023 at 9:40 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 5 Dec 2023 at 11:11, Hayato Kuroda (Fujitsu)
> > <kuroda.hayato@fujitsu.com> wrote:
> > >
> > > Dear Sawada-san, hackers,
> > >
> > > Based on comments I made a fix. PSA the patch.
> > >
> >
> > Thanks for the patch, the changes look good to me.
> >
>
> Thanks, I have added a comment and updated the commit message. I'll
> push this tomorrow unless there are more comments.
>

Pushed.

--
With Regards,
Amit Kapila.

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Thomas Munro

Дата:

17 декабря 2023 г., 07:02:35

FYI fairywren failed in this test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-12-16%2022%3A03%3A06

===8<===
Restoring database schemas in the new cluster
*failure*

Consult the last few lines of

"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
for
the probable cause of the failure.
Failure, exiting
[22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
[22:14:34.600](0.001s) #   Failed test 'run of pg_upgrade of old cluster'
#   at C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl
line 177.
===8<===

Without that log it might be hard to figure out what went wrong though :-/

Re: [PoC] pg_upgrade: allow to upgrade publisher node

От

Alexander Lakhin

Дата:

17 декабря 2023 г., 08:00:00

17.12.2023 07:02, Thomas Munro wrote:
> FYI fairywren failed in this test:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-12-16%2022%3A03%3A06
>
> ===8<===
> Restoring database schemas in the new cluster
> *failure*
>
> Consult the last few lines of
>
"C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
> for
> the probable cause of the failure.
> Failure, exiting
> [22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
> [22:14:34.600](0.001s) #   Failed test 'run of pg_upgrade of old cluster'
> #   at C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl
> line 177.
> ===8<===
>
> Without that log it might be hard to figure out what went wrong though :-/
>

Yes, but most probably it's the same failure as

https://www.postgresql.org/message-id/flat/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best regards,
Alexander

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

17 декабря 2023 г., 18:03:33

Dear Thomas, Alexander,

> 17.12.2023 07:02, Thomas Munro wrote:
> > FYI fairywren failed in this test:
> >
> >
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-1
> 2-16%2022%3A03%3A06
> >
> > ===8<===
> > Restoring database schemas in the new cluster
> > *failure*
> >
> > Consult the last few lines of
> >
> "C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql.build/testrun/pg_upgr
> ade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgra
> de_output.d/20231216T221418.035/log/pg_upgrade_dump_1.log"
> > for
> > the probable cause of the failure.
> > Failure, exiting
> > [22:14:34.598](22.801s) not ok 10 - run of pg_upgrade of old cluster
> > [22:14:34.600](0.001s) #   Failed test 'run of pg_upgrade of old cluster'
> > #   at
> C:/tools/nmsys64/home/pgrunner/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/
> 003_logical_slots.pl
> > line 177.
> > ===8<===
> >
> > Without that log it might be hard to figure out what went wrong though :-/
> >
> 
> Yes, but most probably it's the same failure as
>  

Thanks for reporting. Yes, it has been already reported by me [1], and the server
log was provided by Andrew [2]. The issue was that a file creation was failed
because the same one was unlink()'d just before but it was in STATUS_DELETE_PENDING
status. Kindly Alexander proposed a fix [3] and it looks good to me, but
confirmations by senior and windows-friendly developers are needed to move forward.
(at first we thought the issue was solved by updating, but it was not correct)

I know that you have developed there region, so I'm very happy if you check the
forked thread.

[1]:
https://www.postgresql.org/message-id/flat/TYAPR01MB5866AB7FD922CE30A2565B8BF5A8A%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2]:
https://www.postgresql.org/message-id/TYAPR01MB5866A4E7342088E91362BEF0F5BBA%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[3]: https://www.postgresql.org/message-id/976479cf-dd66-ca19-f40c-5640e30700cb%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

От

"Hayato Kuroda (Fujitsu)"

Дата:

18 декабря 2023 г., 10:40:43

Dear Thomas, Alexander,

> Thanks for reporting. Yes, it has been already reported by me [1], and the server
> log was provided by Andrew [2]. The issue was that a file creation was failed
> because the same one was unlink()'d just before but it was in
> STATUS_DELETE_PENDING
> status. Kindly Alexander proposed a fix [3] and it looks good to me, but
> confirmations by senior and windows-friendly developers are needed to move
> forward.
> (at first we thought the issue was solved by updating, but it was not correct)
> 
> I know that you have developed there region, so I'm very happy if you check the
> forked thread.

I forgot to say an important point. The issue was not introduced by the feature.
It just actualized a possible failure, only for Windows environment.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [PoC] pg_upgrade: allow to upgrade publisher node

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения