Обсуждение: Introduce wait_for_subscription_sync for TAP tests

Поиск

Список

Период

Сортировка

Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

26 июля 2022 г., 04:36:39

Hi,

In tap tests for logical replication, we have the following code in many places:

$node_publisher->wait_for_catchup('tap_sub');
my $synced_query =
  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
$node_subscriber->poll_query_until('postgres', $synced_query)
  or die "Timed out while waiting for subscriber to synchronize data";

Also, we sometime forgot to check either one, like we fixed in commit
1f50918a6fb02207d151e7cb4aae4c36de9d827c.

I think we can have a new function to wait for all subscriptions to
synchronize data. The attached patch introduce a new function
wait_for_subscription_sync(). With this function, we can replace the
above code with this one function as follows:

$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

v1-0001-Introduce-wait_for_subscription_sync-for-TAP-test.patch

Re: Introduce wait_for_subscription_sync for TAP tests

От

Amit Kapila

Дата:

26 июля 2022 г., 08:01:15

On Tue, Jul 26, 2022 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Hi,
>
> In tap tests for logical replication, we have the following code in many places:
>
> $node_publisher->wait_for_catchup('tap_sub');
> my $synced_query =
>   "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
> IN ('r', 's');";
> $node_subscriber->poll_query_until('postgres', $synced_query)
>   or die "Timed out while waiting for subscriber to synchronize data";
>
> Also, we sometime forgot to check either one, like we fixed in commit
> 1f50918a6fb02207d151e7cb4aae4c36de9d827c.
>
> I think we can have a new function to wait for all subscriptions to
> synchronize data. The attached patch introduce a new function
> wait_for_subscription_sync(). With this function, we can replace the
> above code with this one function as follows:
>
> $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
>

+1. This reduces quite some code in various tests and will make it
easier to write future tests.

Few comments/questions:
====================
1.
-$node_publisher->wait_for_catchup('mysub1');
-
-# Also wait for initial table sync to finish.
-my $synced_query =
-  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
-
 # Also wait for initial table sync to finish.
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'mysub1');

It seems to me without your patch there is an extra poll in the above
test. If so, we can probably remove that in a separate patch?

2.
+    # wait for the replication to catchup if required.
+    if (defined($publisher))
+    {
+ croak 'subscription name must be specified' unless defined($subname);
+ $publisher->wait_for_catchup($subname, 'replay');
+    }
+
+    # then, wait for all table states to be ready.
+    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
+    my $query = qq[SELECT count(1) = 0
+     FROM pg_subscription_rel
+     WHERE srsubstate NOT IN ('r', 's');];
+    $self->poll_query_until($dbname, $query)
+ or croak "timed out waiting for subscriber to synchronize data";

In the tests, I noticed that a few places did wait_for_catchup after
the subscription check, and at other places, we did that check before
as you have it here. Ideally, I think wait_for_catchup should be after
confirming the initial sync is over as without initial sync, the
publisher node won't be completely in sync with the subscriber. What
do you think?

3. In the code quoted in the previous point, why did you pass the
second parameter as 'replay' when we have not used that in the tests
otherwise?

-- 
With Regards,
Amit Kapila.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

26 июля 2022 г., 10:41:57

On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 26, 2022 at 7:07 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Hi,
> >
> > In tap tests for logical replication, we have the following code in many places:
> >
> > $node_publisher->wait_for_catchup('tap_sub');
> > my $synced_query =
> >   "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
> > IN ('r', 's');";
> > $node_subscriber->poll_query_until('postgres', $synced_query)
> >   or die "Timed out while waiting for subscriber to synchronize data";
> >
> > Also, we sometime forgot to check either one, like we fixed in commit
> > 1f50918a6fb02207d151e7cb4aae4c36de9d827c.
> >
> > I think we can have a new function to wait for all subscriptions to
> > synchronize data. The attached patch introduce a new function
> > wait_for_subscription_sync(). With this function, we can replace the
> > above code with this one function as follows:
> >
> > $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
> >
>
> +1. This reduces quite some code in various tests and will make it
> easier to write future tests.
>
> Few comments/questions:
> ====================
> 1.
> -$node_publisher->wait_for_catchup('mysub1');
> -
> -# Also wait for initial table sync to finish.
> -my $synced_query =
> -  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
> IN ('r', 's');";
> -$node_subscriber->poll_query_until('postgres', $synced_query)
> -  or die "Timed out while waiting for subscriber to synchronize data";
> -
>  # Also wait for initial table sync to finish.
> -$node_subscriber->poll_query_until('postgres', $synced_query)
> -  or die "Timed out while waiting for subscriber to synchronize data";
> +$node_subscriber->wait_for_subscription_sync($node_publisher, 'mysub1');
>
> It seems to me without your patch there is an extra poll in the above
> test. If so, we can probably remove that in a separate patch?

Agreed.

>
> 2.
> +    # wait for the replication to catchup if required.
> +    if (defined($publisher))
> +    {
> + croak 'subscription name must be specified' unless defined($subname);
> + $publisher->wait_for_catchup($subname, 'replay');
> +    }
> +
> +    # then, wait for all table states to be ready.
> +    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
> +    my $query = qq[SELECT count(1) = 0
> +     FROM pg_subscription_rel
> +     WHERE srsubstate NOT IN ('r', 's');];
> +    $self->poll_query_until($dbname, $query)
> + or croak "timed out waiting for subscriber to synchronize data";
>
> In the tests, I noticed that a few places did wait_for_catchup after
> the subscription check, and at other places, we did that check before
> as you have it here. Ideally, I think wait_for_catchup should be after
> confirming the initial sync is over as without initial sync, the
> publisher node won't be completely in sync with the subscriber.

What do you mean by the last sentence? I thought the order doesn't
matter here. Even if we do wait_for_catchup first then the
subscription check, we can make sure that the apply worker caught up
and table synchronization has been done, no?

>
> 3. In the code quoted in the previous point, why did you pass the
> second parameter as 'replay' when we have not used that in the tests
> otherwise?

It makes sure to use the (current) default value of $mode of
wait_for_catchup(). But probably it's not necessary, I've removed it.

I've attached an updated patch as well as a patch to remove duplicated
waits in 007_ddl.pl.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

RE: Introduce wait_for_subscription_sync for TAP tests

От

"shiy.fnst@fujitsu.com"

Дата:

27 июля 2022 г., 13:08:47

On Tue, Jul 26, 2022 3:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> I've attached an updated patch as well as a patch to remove duplicated
> waits in 007_ddl.pl.
> 

Thanks for your patch. Here are some comments.

1.
I think some comments need to be changed in the patch.
For example:
# Also wait for initial table sync to finish
# Wait for initial sync to finish as well

Words like "Also" and "as well" can be removed now, we originally used them
because we wait for catchup and "also" wait for initial sync.

2.
In the following places, we can remove wait_for_catchup() and then call it in
wait_for_subscription_sync().

2.1.
030_origin.pl:
@@ -128,8 +120,7 @@ $node_B->safe_psql(
 
 $node_C->wait_for_catchup($appname_B2);
 
-$node_B->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_B->wait_for_subscription_sync;
 
2.2.
031_column_list.pl:
@@ -385,7 +373,7 @@ $node_subscriber->safe_psql(
     ALTER SUBSCRIPTION sub1 SET PUBLICATION pub2, pub3
 ));
 
-wait_for_subscription_sync($node_subscriber);
+$node_subscriber->wait_for_subscription_sync;
 
 $node_publisher->wait_for_catchup('sub1');

2.3.
100_bugs.pl:
@@ -281,8 +276,7 @@ $node_subscriber->safe_psql('postgres',
 $node_publisher->wait_for_catchup('tap_sub');
 
 # Also wait for initial table sync to finish
-$node_subscriber->poll_query_until('postgres', $synced_query)
-  or die "Timed out while waiting for subscriber to synchronize data";
+$node_subscriber->wait_for_subscription_sync;
 
 is( $node_subscriber->safe_psql(
         'postgres', "SELECT * FROM tab_replidentity_index"),

Regards,
Shi yu

Re: Introduce wait_for_subscription_sync for TAP tests

От

Amit Kapila

Дата:

27 июля 2022 г., 14:54:46

On Tue, Jul 26, 2022 at 1:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 2.
> > +    # wait for the replication to catchup if required.
> > +    if (defined($publisher))
> > +    {
> > + croak 'subscription name must be specified' unless defined($subname);
> > + $publisher->wait_for_catchup($subname, 'replay');
> > +    }
> > +
> > +    # then, wait for all table states to be ready.
> > +    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
> > +    my $query = qq[SELECT count(1) = 0
> > +     FROM pg_subscription_rel
> > +     WHERE srsubstate NOT IN ('r', 's');];
> > +    $self->poll_query_until($dbname, $query)
> > + or croak "timed out waiting for subscriber to synchronize data";
> >
> > In the tests, I noticed that a few places did wait_for_catchup after
> > the subscription check, and at other places, we did that check before
> > as you have it here. Ideally, I think wait_for_catchup should be after
> > confirming the initial sync is over as without initial sync, the
> > publisher node won't be completely in sync with the subscriber.
>
> What do you mean by the last sentence? I thought the order doesn't
> matter here. Even if we do wait_for_catchup first then the
> subscription check, we can make sure that the apply worker caught up
> and table synchronization has been done, no?
>

That's right. I thought we should first ensure the subscriber has
finished operations if possible, like in this case, it can ensure
table sync has finished and then we can ensure whether publisher and
subscriber are in sync. That sounds more logical to me.

-- 
With Regards,
Amit Kapila.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

28 июля 2022 г., 04:06:41

On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Tue, Jul 26, 2022 3:42 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached an updated patch as well as a patch to remove duplicated
> > waits in 007_ddl.pl.
> >
>
> Thanks for your patch. Here are some comments.

Thank you for the comments!

>
> 1.
> I think some comments need to be changed in the patch.
> For example:
> # Also wait for initial table sync to finish
> # Wait for initial sync to finish as well
>
> Words like "Also" and "as well" can be removed now, we originally used them
> because we wait for catchup and "also" wait for initial sync.

Agreed.

>
> 2.
> In the following places, we can remove wait_for_catchup() and then call it in
> wait_for_subscription_sync().
>
> 2.1.
> 030_origin.pl:
> @@ -128,8 +120,7 @@ $node_B->safe_psql(
>
>  $node_C->wait_for_catchup($appname_B2);
>
> -$node_B->poll_query_until('postgres', $synced_query)
> -  or die "Timed out while waiting for subscriber to synchronize data";
> +$node_B->wait_for_subscription_sync;
>
> 2.2.
> 031_column_list.pl:
> @@ -385,7 +373,7 @@ $node_subscriber->safe_psql(
>         ALTER SUBSCRIPTION sub1 SET PUBLICATION pub2, pub3
>  ));
>
> -wait_for_subscription_sync($node_subscriber);
> +$node_subscriber->wait_for_subscription_sync;
>
>  $node_publisher->wait_for_catchup('sub1');
>
> 2.3.
> 100_bugs.pl:
> @@ -281,8 +276,7 @@ $node_subscriber->safe_psql('postgres',
>  $node_publisher->wait_for_catchup('tap_sub');
>
>  # Also wait for initial table sync to finish
> -$node_subscriber->poll_query_until('postgres', $synced_query)
> -  or die "Timed out while waiting for subscriber to synchronize data";
> +$node_subscriber->wait_for_subscription_sync;
>
>  is( $node_subscriber->safe_psql(
>                 'postgres', "SELECT * FROM tab_replidentity_index"),

Agreed.

I've attached updated patches that incorporated the above comments as
well as the comment from Amit.

BTW regarding 0001 patch to remove the duplicated wait, should we
backpatch to v15? I think we can do that as it's an obvious fix and it
seems to be an oversight in 8f2e2bbf145.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

28 июля 2022 г., 04:07:36

On Wed, Jul 27, 2022 at 8:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 26, 2022 at 1:12 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 26, 2022 at 2:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > 2.
> > > +    # wait for the replication to catchup if required.
> > > +    if (defined($publisher))
> > > +    {
> > > + croak 'subscription name must be specified' unless defined($subname);
> > > + $publisher->wait_for_catchup($subname, 'replay');
> > > +    }
> > > +
> > > +    # then, wait for all table states to be ready.
> > > +    print "Waiting for all subscriptions in \"$name\" to synchronize data\n";
> > > +    my $query = qq[SELECT count(1) = 0
> > > +     FROM pg_subscription_rel
> > > +     WHERE srsubstate NOT IN ('r', 's');];
> > > +    $self->poll_query_until($dbname, $query)
> > > + or croak "timed out waiting for subscriber to synchronize data";
> > >
> > > In the tests, I noticed that a few places did wait_for_catchup after
> > > the subscription check, and at other places, we did that check before
> > > as you have it here. Ideally, I think wait_for_catchup should be after
> > > confirming the initial sync is over as without initial sync, the
> > > publisher node won't be completely in sync with the subscriber.
> >
> > What do you mean by the last sentence? I thought the order doesn't
> > matter here. Even if we do wait_for_catchup first then the
> > subscription check, we can make sure that the apply worker caught up
> > and table synchronization has been done, no?
> >
>
> That's right. I thought we should first ensure the subscriber has
> finished operations if possible, like in this case, it can ensure
> table sync has finished and then we can ensure whether publisher and
> subscriber are in sync. That sounds more logical to me.

Make sense. I've incorporated it in the v3 patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Introduce wait_for_subscription_sync for TAP tests

От

Amit Kapila

Дата:

30 июля 2022 г., 09:55:01

On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
>
> I've attached updated patches that incorporated the above comments as
> well as the comment from Amit.
>
> BTW regarding 0001 patch to remove the duplicated wait, should we
> backpatch to v15?
>

I think it is good to clean this test case even for PG15 even though
there is no major harm in keeping it. I'll push this early next week
by Tuesday unless someone thinks otherwise.

-- 
With Regards,
Amit Kapila.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Amit Kapila

Дата:

03 августа 2022 г., 07:51:15

On Sat, Jul 30, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
> > <shiy.fnst@fujitsu.com> wrote:
> >
> > I've attached updated patches that incorporated the above comments as
> > well as the comment from Amit.
> >
> > BTW regarding 0001 patch to remove the duplicated wait, should we
> > backpatch to v15?
> >
>
> I think it is good to clean this test case even for PG15 even though
> there is no major harm in keeping it. I'll push this early next week
> by Tuesday unless someone thinks otherwise.
>

Pushed this one and now I'll look at your other patch.

-- 
With Regards,
Amit Kapila.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

03 августа 2022 г., 09:46:02

On Wed, Aug 3, 2022 at 1:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Jul 30, 2022 at 12:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jul 28, 2022 at 6:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Jul 27, 2022 at 7:08 PM shiy.fnst@fujitsu.com
> > > <shiy.fnst@fujitsu.com> wrote:
> > >
> > > I've attached updated patches that incorporated the above comments as
> > > well as the comment from Amit.
> > >
> > > BTW regarding 0001 patch to remove the duplicated wait, should we
> > > backpatch to v15?
> > >
> >
> > I think it is good to clean this test case even for PG15 even though
> > there is no major harm in keeping it. I'll push this early next week
> > by Tuesday unless someone thinks otherwise.
> >
>
> Pushed this one and now I'll look at your other patch.

Thanks!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: Introduce wait_for_subscription_sync for TAP tests

От

Amit Kapila

Дата:

04 августа 2022 г., 04:37:16

On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Pushed this one and now I'll look at your other patch.
>

I have pushed the second patch as well after making minor changes in
the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
they sound reasonable to me. Will you be able to produce back branch
patches?

[1] - https://www.postgresql.org/message-id/20220803104544.k2luy5hr2ugnhgr2%40alvherre.pgsql
[2] - https://www.postgresql.org/message-id/2966703.1659535343%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

04 августа 2022 г., 09:28:05

On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Pushed this one and now I'll look at your other patch.
> >
>
> I have pushed the second patch as well after making minor changes in
> the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
> they sound reasonable to me. Will you be able to produce back branch
> patches?

Yes. I've attached patches for backbranches. The updates are
straightforward on v11 - v15. However, on v10, we don't use
wait_for_catchup() in some logical replication test cases. The commit
bbd3363e128dae refactored the tests to use wait_for_catchup but it's
not backpatched. So in the patch for v10, I didn't change the code
that was changed by the commit. Also, since wait_for_catchup requires
to specify $target_lsn, unlike the one in v11 or later, I changed
wait_for_subscription_sync() accordingly.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

RE: Introduce wait_for_subscription_sync for TAP tests

От

"shiy.fnst@fujitsu.com"

Дата:

04 августа 2022 г., 12:49:15

On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > Pushed this one and now I'll look at your other patch.
> > >
> >
> > I have pushed the second patch as well after making minor changes in
> > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
> > they sound reasonable to me. Will you be able to produce back branch
> > patches?
> 
> Yes. I've attached patches for backbranches. The updates are
> straightforward on v11 - v15. However, on v10, we don't use
> wait_for_catchup() in some logical replication test cases. The commit
> bbd3363e128dae refactored the tests to use wait_for_catchup but it's
> not backpatched. So in the patch for v10, I didn't change the code
> that was changed by the commit. Also, since wait_for_catchup requires
> to specify $target_lsn, unlike the one in v11 or later, I changed
> wait_for_subscription_sync() accordingly.
> 

Thanks for your patches.

In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
current change in PostgresNode.pm. Right?

Regards,
Shi yu

Re: Introduce wait_for_subscription_sync for TAP tests

От

Tom Lane

Дата:

04 августа 2022 г., 16:43:26

Masahiko Sawada <sawada.mshk@gmail.com> writes:
> Yes. I've attached patches for backbranches.

FWIW, I'd recommend waiting till after next week's wrap before
pushing these.  While I'm definitely in favor of doing this,
the odds of introducing a bug are nonzero, so right before a
release deadline doesn't seem like a good time.

            regards, tom lane

RE: Introduce wait_for_subscription_sync for TAP tests

От

"shiy.fnst@fujitsu.com"

Дата:

05 августа 2022 г., 04:39:49

On Thu, Aug 4, 2022 5:49 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:
> 
> On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
> >
> > On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > >
> > > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila
> <amit.kapila16@gmail.com>
> > wrote:
> > > >
> > > > Pushed this one and now I'll look at your other patch.
> > > >
> > >
> > > I have pushed the second patch as well after making minor changes in
> > > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
> > > they sound reasonable to me. Will you be able to produce back branch
> > > patches?
> >
> > Yes. I've attached patches for backbranches. The updates are
> > straightforward on v11 - v15. However, on v10, we don't use
> > wait_for_catchup() in some logical replication test cases. The commit
> > bbd3363e128dae refactored the tests to use wait_for_catchup but it's
> > not backpatched. So in the patch for v10, I didn't change the code
> > that was changed by the commit. Also, since wait_for_catchup requires
> > to specify $target_lsn, unlike the one in v11 or later, I changed
> > wait_for_subscription_sync() accordingly.
> >
> 
> Thanks for your patches.
> 
> In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
> current change in PostgresNode.pm. Right?
> 

By the way, I notice that in 002_types.pl (on master branch), it seems the "as
well" in the following comment should be removed. Is it worth being fixed?

$node_subscriber->safe_psql('postgres',
    "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub WITH (slot_name = tap_sub_slot)"
);

# Wait for initial sync to finish as well
$node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');

Regards,
Shi yu

Re: Introduce wait_for_subscription_sync for TAP tests

От

Amit Kapila

Дата:

05 августа 2022 г., 05:57:23

On Thu, Aug 4, 2022 at 7:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > Yes. I've attached patches for backbranches.
>
> FWIW, I'd recommend waiting till after next week's wrap before
> pushing these.  While I'm definitely in favor of doing this,
> the odds of introducing a bug are nonzero, so right before a
> release deadline doesn't seem like a good time.
>

Agreed. I was planning to do it only after next week's wrap. Thanks
for your suggestion.

-- 
With Regards,
Amit Kapila.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

10 августа 2022 г., 08:09:20

On Fri, Aug 5, 2022 at 10:39 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Thu, Aug 4, 2022 5:49 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:
> >
> > On Thu, Aug 4, 2022 2:28 PM Masahiko Sawada <sawada.mshk@gmail.com>
> > wrote:
> > >
> > > On Thu, Aug 4, 2022 at 10:37 AM Amit Kapila <amit.kapila16@gmail.com>
> > > wrote:
> > > >
> > > > On Wed, Aug 3, 2022 at 10:21 AM Amit Kapila
> > <amit.kapila16@gmail.com>
> > > wrote:
> > > > >
> > > > > Pushed this one and now I'll look at your other patch.
> > > > >
> > > >
> > > > I have pushed the second patch as well after making minor changes in
> > > > the comments. Alvaro [1] and Tom [2] suggest to back-patch this and
> > > > they sound reasonable to me. Will you be able to produce back branch
> > > > patches?
> > >
> > > Yes. I've attached patches for backbranches. The updates are
> > > straightforward on v11 - v15. However, on v10, we don't use
> > > wait_for_catchup() in some logical replication test cases. The commit
> > > bbd3363e128dae refactored the tests to use wait_for_catchup but it's
> > > not backpatched. So in the patch for v10, I didn't change the code
> > > that was changed by the commit. Also, since wait_for_catchup requires
> > > to specify $target_lsn, unlike the one in v11 or later, I changed
> > > wait_for_subscription_sync() accordingly.
> > >
> >
> > Thanks for your patches.
> >
> > In the patches for pg11 ~ pg14, it looks we need to add a "=pod" before the
> > current change in PostgresNode.pm. Right?
> >
>
> By the way, I notice that in 002_types.pl (on master branch), it seems the "as
> well" in the following comment should be removed. Is it worth being fixed?
>
> $node_subscriber->safe_psql('postgres',
>         "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr' PUBLICATION tap_pub WITH (slot_name =
tap_sub_slot)"
> );
>
> # Wait for initial sync to finish as well
> $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub');
>

Thank you for the comments. I've attached updated version patches.
Please review them.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Вложения

Re: Introduce wait_for_subscription_sync for TAP tests

От

Amit Kapila

Дата:

12 августа 2022 г., 11:47:42

On Wed, Aug 10, 2022 at 10:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Aug 5, 2022 at 10:39 AM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
>
> Thank you for the comments. I've attached updated version patches.
> Please review them.
>

Pushed.

-- 
With Regards,
Amit Kapila.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Tom Lane

Дата:

09 сентября 2022 г., 17:31:10

Amit Kapila <amit.kapila16@gmail.com> writes:
> Pushed.

Recently a number of buildfarm animals have failed at the same
place in src/test/subscription/t/100_bugs.pl [1][2][3][4]:

#   Failed test '2x3000 rows in t'
#   at t/100_bugs.pl line 149.
#          got: '9000'
#     expected: '6000'
# Looks like you failed 1 test of 7.
[09:30:56] t/100_bugs.pl ......................

This was the last commit to touch that test script.  I'm thinking
maybe it wasn't adjusted quite correctly?  On the other hand, since
I can't find any similar failures before the last 48 hours, maybe
there is some other more-recent commit to blame.  Anyway, something
is wrong there.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2022-09-09%2012%3A03%3A46
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2022-09-09%2011%3A16%3A36
[3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2022-09-09%2010%3A33%3A19
[4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2022-09-08%2010%3A56%3A59

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

10 сентября 2022 г., 00:33:51

On Fri, Sep 9, 2022 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > Pushed.
>
> Recently a number of buildfarm animals have failed at the same
> place in src/test/subscription/t/100_bugs.pl [1][2][3][4]:
>
> #   Failed test '2x3000 rows in t'
> #   at t/100_bugs.pl line 149.
> #          got: '9000'
> #     expected: '6000'
> # Looks like you failed 1 test of 7.
> [09:30:56] t/100_bugs.pl ......................
>
> This was the last commit to touch that test script.  I'm thinking
> maybe it wasn't adjusted quite correctly?  On the other hand, since
> I can't find any similar failures before the last 48 hours, maybe
> there is some other more-recent commit to blame.  Anyway, something
> is wrong there.

It seems that this commit is innocent as it changed only how to wait.
Rather, looking at the logs, the tablesync worker errored out at an
interesting point:

022-09-09 09:30:19.630 EDT [631b3feb.840:13]
pg_16400_sync_16392_7141371862484106124 ERROR:  could not find record
while sending logically-decoded data: missing contrecord at 0/1D4FFF8
2022-09-09 09:30:19.630 EDT [631b3feb.840:14]
pg_16400_sync_16392_7141371862484106124 STATEMENT:  START_REPLICATION
SLOT "pg_16400_sync_16392_7141371862484106124" LOGICAL 0/0
(proto_version '3', origin 'any', publication_names '"testpub"')
ERROR:  could not find record while sending logically-decoded data:
missing contrecord at 0/1D4FFF8
2022-09-09 09:30:19.631 EDT [631b3feb.26e8:2] ERROR:  error while
shutting down streaming COPY: ERROR:  could not find record while
sending logically-decoded data: missing contrecord at 0/1D4FFF8

It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
is relevant.

Regards,

-- 
Masahiko Sawada

Re: Introduce wait_for_subscription_sync for TAP tests

От

Tom Lane

Дата:

10 сентября 2022 г., 00:45:43

Masahiko Sawada <sawada.mshk@gmail.com> writes:
> On Fri, Sep 9, 2022 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Recently a number of buildfarm animals have failed at the same
>> place in src/test/subscription/t/100_bugs.pl [1][2][3][4]:
>> 
>> #   Failed test '2x3000 rows in t'
>> #   at t/100_bugs.pl line 149.
>> #          got: '9000'
>> #     expected: '6000'
>> # Looks like you failed 1 test of 7.
>> [09:30:56] t/100_bugs.pl ......................
>> 
>> This was the last commit to touch that test script.  I'm thinking
>> maybe it wasn't adjusted quite correctly?  On the other hand, since
>> I can't find any similar failures before the last 48 hours, maybe
>> there is some other more-recent commit to blame.  Anyway, something
>> is wrong there.

> It seems that this commit is innocent as it changed only how to wait.

Yeah.  I was wondering if it caused us to fail to wait somewhere,
but I concur that's not all that likely.

> It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
> is relevant.

Noting that the errors have only appeared in the past couple of
days, I'm now suspicious of adb466150b44d1eaf43a2d22f58ff4c545a0ed3f
(Fix recovery_prefetch with low maintenance_io_concurrency).

            regards, tom lane

Re: Introduce wait_for_subscription_sync for TAP tests

От

Masahiko Sawada

Дата:

10 сентября 2022 г., 00:54:42

On Sat, Sep 10, 2022 at 6:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > On Fri, Sep 9, 2022 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Recently a number of buildfarm animals have failed at the same
> >> place in src/test/subscription/t/100_bugs.pl [1][2][3][4]:
> >>
> >> #   Failed test '2x3000 rows in t'
> >> #   at t/100_bugs.pl line 149.
> >> #          got: '9000'
> >> #     expected: '6000'
> >> # Looks like you failed 1 test of 7.
> >> [09:30:56] t/100_bugs.pl ......................
> >>
> >> This was the last commit to touch that test script.  I'm thinking
> >> maybe it wasn't adjusted quite correctly?  On the other hand, since
> >> I can't find any similar failures before the last 48 hours, maybe
> >> there is some other more-recent commit to blame.  Anyway, something
> >> is wrong there.
>
> > It seems that this commit is innocent as it changed only how to wait.
>
> Yeah.  I was wondering if it caused us to fail to wait somewhere,
> but I concur that's not all that likely.
>
> > It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
> > is relevant.
>
> Noting that the errors have only appeared in the past couple of
> days, I'm now suspicious of adb466150b44d1eaf43a2d22f58ff4c545a0ed3f
> (Fix recovery_prefetch with low maintenance_io_concurrency).

Probably I found the cause of this failure[1]. The commit
f6c5edb8abcac04eb3eac6da356e59d399b2bcef didn't fix the problem
properly.

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAw0Oofi4kiDpJBOwpYyBBBkJj%3DsLUOn4Gd2GjUAKG-fw%40mail.gmail.com

-- 
Masahiko Sawada

Re: Introduce wait_for_subscription_sync for TAP tests

От

Thomas Munro

Дата:

10 сентября 2022 г., 01:00:51

On Sat, Sep 10, 2022 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > On Fri, Sep 9, 2022 at 11:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Recently a number of buildfarm animals have failed at the same
> >> place in src/test/subscription/t/100_bugs.pl [1][2][3][4]:
> >>
> >> #   Failed test '2x3000 rows in t'
> >> #   at t/100_bugs.pl line 149.
> >> #          got: '9000'
> >> #     expected: '6000'
> >> # Looks like you failed 1 test of 7.
> >> [09:30:56] t/100_bugs.pl ......................
> >>
> >> This was the last commit to touch that test script.  I'm thinking
> >> maybe it wasn't adjusted quite correctly?  On the other hand, since
> >> I can't find any similar failures before the last 48 hours, maybe
> >> there is some other more-recent commit to blame.  Anyway, something
> >> is wrong there.
>
> > It seems that this commit is innocent as it changed only how to wait.
>
> Yeah.  I was wondering if it caused us to fail to wait somewhere,
> but I concur that's not all that likely.
>
> > It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
> > is relevant.
>
> Noting that the errors have only appeared in the past couple of
> days, I'm now suspicious of adb466150b44d1eaf43a2d22f58ff4c545a0ed3f
> (Fix recovery_prefetch with low maintenance_io_concurrency).

Yeah, I also just spotted the coincidence of those failures while
monitoring the build farm.  I'll look into this later today.  My
initial suspicion is that there was pre-existing code here that was
(incorrectly?) relying on the lack of error reporting in that case.
But maybe I misunderstood and it was incorrect to report the error for
some reason that was not robustly covered with tests.

Re: Introduce wait_for_subscription_sync for TAP tests

От

Thomas Munro

Дата:

12 сентября 2022 г., 13:42:31

On Sat, Sep 10, 2022 at 10:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, Sep 10, 2022 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > > It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
> > > is relevant.
> >
> > Noting that the errors have only appeared in the past couple of
> > days, I'm now suspicious of adb466150b44d1eaf43a2d22f58ff4c545a0ed3f
> > (Fix recovery_prefetch with low maintenance_io_concurrency).
>
> Yeah, I also just spotted the coincidence of those failures while
> monitoring the build farm.  I'll look into this later today.  My
> initial suspicion is that there was pre-existing code here that was
> (incorrectly?) relying on the lack of error reporting in that case.
> But maybe I misunderstood and it was incorrect to report the error for
> some reason that was not robustly covered with tests.

After I wrote that I saw Sawada-san's message and waited for more
information, and I see there was now a commit.  I noticed that
peripatus was already logging the 'missing contrecord' error even when
it didn't fail the test, and still does.  I'm still looking into that
(ie whether I need to take that new report_invalid_record() call out
and replace it with errormsg_deferred = true so that XLogReadRecord()
returns NULL with no error message in this case).

Re: Introduce wait_for_subscription_sync for TAP tests

От

Thomas Munro

Дата:

30 сентября 2022 г., 05:11:49

On Mon, Sep 12, 2022 at 10:42 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, Sep 10, 2022 at 10:00 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > On Sat, Sep 10, 2022 at 9:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > Masahiko Sawada <sawada.mshk@gmail.com> writes:
> > > > It's likely that the commit f6c5edb8abcac04eb3eac6da356e59d399b2bcef
> > > > is relevant.
> > >
> > > Noting that the errors have only appeared in the past couple of
> > > days, I'm now suspicious of adb466150b44d1eaf43a2d22f58ff4c545a0ed3f
> > > (Fix recovery_prefetch with low maintenance_io_concurrency).
> >
> > Yeah, I also just spotted the coincidence of those failures while
> > monitoring the build farm.  I'll look into this later today.  My
> > initial suspicion is that there was pre-existing code here that was
> > (incorrectly?) relying on the lack of error reporting in that case.
> > But maybe I misunderstood and it was incorrect to report the error for
> > some reason that was not robustly covered with tests.
>
> After I wrote that I saw Sawada-san's message and waited for more
> information, and I see there was now a commit.  I noticed that
> peripatus was already logging the 'missing contrecord' error even when
> it didn't fail the test, and still does.  I'm still looking into that
> (ie whether I need to take that new report_invalid_record() call out
> and replace it with errormsg_deferred = true so that XLogReadRecord()
> returns NULL with no error message in this case).

I will go ahead and remove this new error message added by adb46615.
The message was correlated with the problem on peripatus fixed by
88f48831, but not the cause of it -- but it's also not terribly
helpful and might be confusing.  It might be reported: (1) in
pg_waldump when you hit the end of the segment with a missing
contrecord after the end, arguably rightfully, but then perhaps
someone might complain that they expect an error from pg_waldump only
on the final live segment at the end of the WAL, and (2) in a
walsender that is asked to shut down while between reads of pages with
a spanning contrecord, reported by logical_read_xlog_page() with a
messageless error (presumably peripatus's case?), (3) in crash
recovery with wal_recycle off (whereas normally you'd expect to see
the page_addr message from a recycled file), maybe more legitimately
than the above.  The problem I needed to solve can be solved without
the message just by setting that flag as mentioned above, so I'll do
that to remove the new noise.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Introduce wait_for_subscription_sync for TAP tests

Вложения

Вложения

Вложения

Вложения

Вложения