Обсуждение: Missing import in 035_standby_logical_decoding.pl
Hi, 035_standby_logical_decoding.pl has this code # wait for postgres to terminate foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default) { last if !-f $node_standby->data_dir . '/postmaster.pid'; usleep(100_000); } but it does not import usleep as other tests do (like this) use Time::HiRes qw(usleep); I think it simply hasn't been called because postgres exits and terminates the loop before it has had a chance to be called. If you reorder the sleep and the loop termination condition, it fails immediately because it can't find usleep. On an unrelated note, I also noticed that hot_standby_feedback is not turned on during the test until we are explicitly checking for recovery conflicts. When it is enabled, the comment says "Turn hot_standby_feedback back on" -- but it was never on to begin with. I suspect this doesn't produce test instability because autovacuum is turned off for the whole test. But, I was wondering if the test authors meant for hot_standby_feedback to be enabled during the initial phases of the test (until it is explicitly disabled for testing purposes). - Melanie
On Fri, Aug 1, 2025 at 3:04 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > 035_standby_logical_decoding.pl has this code Another issue I see is that there are two incorrect tests for a log message # message should not be issued ok( !$node_standby->log_contains( "invalidating obsolete slot \"no_conflict_inactiveslot\"", $logstart), 'inactiveslot slot invalidation is not logged with vacuum on conflict_test' ); ok( !$node_standby->log_contains( "invalidating obsolete slot \"no_conflict_activeslot\"", $logstart), 'activeslot slot invalidation is not logged with vacuum on conflict_test' ); This will never fail because the message string is actually "invalidating obsolete replication slot" It is correctly specified elsewhere in the same test. so, that should be fixed too. - Melanie
On Fri, Aug 1, 2025 at 12:04 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > Hi, > > 035_standby_logical_decoding.pl has this code > > # wait for postgres to terminate > foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default) > { > last if !-f $node_standby->data_dir . '/postmaster.pid'; > usleep(100_000); > } > > but it does not import usleep as other tests do (like this) > use Time::HiRes qw(usleep); > > I think it simply hasn't been called because postgres exits and > terminates the loop before it has had a chance to be called. If you > reorder the sleep and the loop termination condition, it fails > immediately because it can't find usleep. Agreed, usleep should be imported. > > On an unrelated note, I also noticed that hot_standby_feedback is not > turned on during the test until we are explicitly checking for > recovery conflicts. When it is enabled, the comment says "Turn > hot_standby_feedback back on" -- but it was never on to begin with. I > suspect this doesn't produce test instability because autovacuum is > turned off for the whole test. But, I was wondering if the test > authors meant for hot_standby_feedback to be enabled during the > initial phases of the test (until it is explicitly disabled for > testing purposes). While I agree that the first "Turn hot_standby_feedback back on" message is confusing for the reason you argued, I have no idea whether we should enable hot_standby_feedback also during the initial phases of the test. It could be a typo? I'll wait for comments from the authors. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Aug 1, 2025 at 1:58 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > > On Fri, Aug 1, 2025 at 3:04 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > 035_standby_logical_decoding.pl has this code > > Another issue I see is that there are two incorrect tests for a log message > > # message should not be issued > ok( !$node_standby->log_contains( > "invalidating obsolete slot \"no_conflict_inactiveslot\"", $logstart), > 'inactiveslot slot invalidation is not logged with vacuum on conflict_test' > ); > > ok( !$node_standby->log_contains( > "invalidating obsolete slot \"no_conflict_activeslot\"", $logstart), > 'activeslot slot invalidation is not logged with vacuum on conflict_test' > ); > > This will never fail because the message string is actually > "invalidating obsolete replication slot" > > It is correctly specified elsewhere in the same test. > so, that should be fixed too. +1 Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Fri, Aug 01, 2025 at 03:22:06PM -0700, Masahiko Sawada wrote: > On Fri, Aug 1, 2025 at 1:58 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: >> Another issue I see is that there are two incorrect tests for a log message >> >> # message should not be issued >> ok( !$node_standby->log_contains( >> "invalidating obsolete slot \"no_conflict_inactiveslot\"", $logstart), >> 'inactiveslot slot invalidation is not logged with vacuum on conflict_test' >> ); >> >> ok( !$node_standby->log_contains( >> "invalidating obsolete slot \"no_conflict_activeslot\"", $logstart), >> 'activeslot slot invalidation is not logged with vacuum on conflict_test' >> ); >> >> This will never fail because the message string is actually >> "invalidating obsolete replication slot" >> >> It is correctly specified elsewhere in the same test. >> so, that should be fixed too. > > +1 Not the author of these two ones, but am I the only one who does not see the point of these two tests? These check that something does not happen, but log_contains() is useful to make sure that an event *actually happens*. Wouldn't it be simpler to remove these checks or switch them to use/add a log message that's triggered with a state that we'd expect to happen in reality? -- Michael
Вложения
Hi, On Fri, Aug 01, 2025 at 03:04:20PM -0400, Melanie Plageman wrote: > Hi, > > 035_standby_logical_decoding.pl has this code > > # wait for postgres to terminate > foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default) > { > last if !-f $node_standby->data_dir . '/postmaster.pid'; > usleep(100_000); > } > > but it does not import usleep as other tests do (like this) > use Time::HiRes qw(usleep); > > I think it simply hasn't been called because postgres exits and > terminates the loop before it has had a chance to be called. If you > reorder the sleep and the loop termination condition, it fails > immediately because it can't find usleep. Thanks for the report! I think it's an oversight in commit 48796a98d5ae. Fixed in the attached. > On an unrelated note, I also noticed that hot_standby_feedback is not > turned on during the test until we are explicitly checking for > recovery conflicts. Yeah, it's expected. > When it is enabled, the comment says "Turn > hot_standby_feedback back on" -- but it was never on to begin with. I > suspect this doesn't produce test instability because autovacuum is > turned off for the whole test. But, I was wondering if the test > authors meant for hot_standby_feedback to be enabled during the > initial phases of the test (until it is explicitly disabled for > testing purposes). It's typo/bad wording fixed in the attached. hot_standby_feedback does not need to be enabled during the initial phases of the test because there is no catalog changes that could produce a conflict. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
Hi, On Fri, Aug 01, 2025 at 04:57:50PM -0400, Melanie Plageman wrote: > On Fri, Aug 1, 2025 at 3:04 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > > > 035_standby_logical_decoding.pl has this code > > Another issue I see is that there are two incorrect tests for a log message > > # message should not be issued > ok( !$node_standby->log_contains( > "invalidating obsolete slot \"no_conflict_inactiveslot\"", $logstart), > 'inactiveslot slot invalidation is not logged with vacuum on conflict_test' > ); > > ok( !$node_standby->log_contains( > "invalidating obsolete slot \"no_conflict_activeslot\"", $logstart), > 'activeslot slot invalidation is not logged with vacuum on conflict_test' > ); > > This will never fail because the message string is actually > "invalidating obsolete replication slot" > > It is correctly specified elsewhere in the same test. > so, that should be fixed too. Thanks for the report! Indeed, fixed in 0001 just shared up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Sat, Aug 02, 2025 at 08:09:14AM +0900, Michael Paquier wrote: > On Fri, Aug 01, 2025 at 03:22:06PM -0700, Masahiko Sawada wrote: > > On Fri, Aug 1, 2025 at 1:58 PM Melanie Plageman > > <melanieplageman@gmail.com> wrote: > >> Another issue I see is that there are two incorrect tests for a log message > >> > >> # message should not be issued > >> ok( !$node_standby->log_contains( > >> "invalidating obsolete slot \"no_conflict_inactiveslot\"", $logstart), > >> 'inactiveslot slot invalidation is not logged with vacuum on conflict_test' > >> ); > >> > >> ok( !$node_standby->log_contains( > >> "invalidating obsolete slot \"no_conflict_activeslot\"", $logstart), > >> 'activeslot slot invalidation is not logged with vacuum on conflict_test' > >> ); > >> > >> This will never fail because the message string is actually > >> "invalidating obsolete replication slot" > >> > >> It is correctly specified elsewhere in the same test. > >> so, that should be fixed too. > > > > +1 > > Not the author of these two ones, but am I the only one who does not > see the point of these two tests? These check that something does not > happen, Yeah, we want to check that no conflicts occur for non catalog table. > but log_contains() is useful to make sure that an event > *actually happens*. Not sure why it could not be useful to check that an event does not happen ( 019_replslot_limit.pl also uses it that way). > Wouldn't it be simpler to remove these checks I'm afraid that could lead to false positives or to the test taking longer as waiting for a timeout to occur. > or > switch them to use/add a log message that's triggered with a state > that we'd expect to happen in reality? If no conflicts occur, we do not emit messages in the log, then having nothing to check. I think that's fine that way. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Aug 4, 2025 at 3:52 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > On Sat, Aug 02, 2025 at 08:09:14AM +0900, Michael Paquier wrote: > > > Not the author of these two ones, but am I the only one who does not > > see the point of these two tests? These check that something does not > > happen, > > > > Wouldn't it be simpler to remove these checks > > I'm afraid that could lead to false positives or to the test taking longer as > waiting for a timeout to occur. This rationale is interesting. You are basically saying that you are counting on one of the two log_contains() tests failing and never actually reaching the pg_stat_database_conflicts query test in a failure scenario. When I was writing the recovery conflict (031_recovery_conflict.pl) tests, I added both log message and stats checks because we didn't have coverage of pg_stat_database_conflicts in our other stats tests (that would have been hard to do in regress). The stats test was expected to fail when something was wrong with the recovery conflict stats updates -- not so much when something was wrong with the recovery conflict handling logic itself. But that argument makes less sense here in 035_standby_logical_decoding for the happy path test case we are talking about. Though, I suppose someone could write code that _incorrectly_ updated confl_active_logicalslot stats and this would catch that. Anyway, there's probably no reason to remove it (the pg_stat_database_conflicts check for the happy path case) -- we could come up with other, more impactful ways to get the test runtime down, I imagine. This test is not unprecedented in testing the absence of logfile messages -- but it has the same problem those tests have in that they don't fail nicely when the message changes or someone typo'd the matching message string in the test. - Melanie
On Mon, Aug 4, 2025 at 3:49 AM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > On Fri, Aug 01, 2025 at 03:04:20PM -0400, Melanie Plageman wrote: > > > but it does not import usleep as other tests do (like this) > > use Time::HiRes qw(usleep); > > > > I think it simply hasn't been called because postgres exits and > > terminates the loop before it has had a chance to be called. If you > > reorder the sleep and the loop termination condition, it fails > > immediately because it can't find usleep. > > Thanks for the report! I think it's an oversight in commit 48796a98d5ae. Fixed > in the attached. I've committed and backported the fix. > > On an unrelated note, I also noticed that hot_standby_feedback is not > > turned on during the test until we are explicitly checking for > > recovery conflicts. > > Yeah, it's expected. I see. I had to change it to be on for more of the test scenarios in my patchset which sets the VM on-access [1] (v6-0017), but I suppose it doesn't make sense to make this change on master. - Melanie [1] https://www.postgresql.org/message-id/CAAKRu_bGCgUuhmy1Mzkw3yCmbGcjNQAjV%3DOmjuW6hz90PuXKWA%40mail.gmail.com