Обсуждение: Implement generalized sub routine find_in_log for tap test
Hi, The recovery tap test has 4 implementations of find_in_log sub routine for various uses, I felt we can generalize these and have a single function for the same. The attached patch is an attempt to have a generalized sub routine find_in_log which can be used by all of them. Thoughts? Regards, VIgnesh
Вложения
vignesh C <vignesh21@gmail.com> writes: > Hi, > > The recovery tap test has 4 implementations of find_in_log sub routine > for various uses, I felt we can generalize these and have a single > function for the same. The attached patch is an attempt to have a > generalized sub routine find_in_log which can be used by all of them. > Thoughts? +1 on factoring out this common code. Just a few comments on the implementation. > diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm > index a27fac83d2..5c9b2f6c03 100644 > --- a/src/test/perl/PostgreSQL/Test/Utils.pm > +++ b/src/test/perl/PostgreSQL/Test/Utils.pm > @@ -67,6 +67,7 @@ our @EXPORT = qw( > slurp_file > append_to_file > string_replace_file > + find_in_log > check_mode_recursive > chmod_recursive > check_pg_config > @@ -579,6 +580,28 @@ sub string_replace_file > > =pod > > + > +=item find_in_log(node, pattern, offset) > + > +Find pattern in logfile of node after offset byte. > + > +=cut > + > +sub find_in_log > +{ > + my ($node, $pattern, $offset) = @_; > + > + $offset = 0 unless defined $offset; > + my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile); Since this function is in the same package, there's no need to qualify it with the full name. I know the callers you copied it from did, but they wouldn't have had to either, since it's exported by default (in the @EXPORT array above), unless the use statement has an explicit argument list that excludes it. > + return 0 if (length($log) <= 0 || length($log) <= $offset); > + > + $log = substr($log, $offset); Also, the existing callers don't seem to have got the memo that slurp_file() takes an optinal offset parameter, which will cause it to seek to that postion before slurping the file, which is more efficient than reading the whole file in and substr-ing it. There's not much point in the length checks either, since regex-matching against an empty string is very cheap (and if the provide pattern can match the empty string the whole function call is rather pointless). > + return $log =~ m/$pattern/; > +} All in all, it could be simplified to: sub find_in_log { my ($node, $pattern, $offset) = @_; return slurp_file($node->logfile, $offset) =~ $pattern; } However, none of the other functions in ::Utils know anything about node objects, which makes me think it should be a method on the node itself (i.e. in PostgreSQL::Test::Cluster) instead. Also, I think log_contains would be a better name, since it just returns a boolean. The name find_in_log makes me think it would return the log lines matching the pattern, or the position of the match in the file. In that case, the slurp_file() call would have to be fully qualified, since ::Cluster uses an empty import list to avoid polluting the method namespace with imported functions. - ilmari
On Thu, May 25, 2023 at 06:34:20PM +0100, Dagfinn Ilmari Mannsåker wrote: > However, none of the other functions in ::Utils know anything about node > objects, which makes me think it should be a method on the node itself > (i.e. in PostgreSQL::Test::Cluster) instead. Also, I think log_contains > would be a better name, since it just returns a boolean. The name > find_in_log makes me think it would return the log lines matching the > pattern, or the position of the match in the file. > > In that case, the slurp_file() call would have to be fully qualified, > since ::Cluster uses an empty import list to avoid polluting the method > namespace with imported functions. Hmm. connect_ok() and connect_fails() in Cluster.pm have a similar log comparison logic, feeding from the offset of a log file. Couldn't you use the same code across the board for everything? Note that this stuff is parameterized so as it is possible to check if patterns match or do not match, for multiple patterns. It seems to me that we could use the new log finding routine there as well, so how about extending it a bit more? You would need, at least: - One parameter for log entries matching. - One parameter for log entries not matching. -- Michael
Вложения
On Thu, 25 May 2023 at 23:04, Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote: > > vignesh C <vignesh21@gmail.com> writes: > > > Hi, > > > > The recovery tap test has 4 implementations of find_in_log sub routine > > for various uses, I felt we can generalize these and have a single > > function for the same. The attached patch is an attempt to have a > > generalized sub routine find_in_log which can be used by all of them. > > Thoughts? > > +1 on factoring out this common code. Just a few comments on the implementation. > > > > diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm > > index a27fac83d2..5c9b2f6c03 100644 > > --- a/src/test/perl/PostgreSQL/Test/Utils.pm > > +++ b/src/test/perl/PostgreSQL/Test/Utils.pm > > @@ -67,6 +67,7 @@ our @EXPORT = qw( > > slurp_file > > append_to_file > > string_replace_file > > + find_in_log > > check_mode_recursive > > chmod_recursive > > check_pg_config > > @@ -579,6 +580,28 @@ sub string_replace_file > > > > =pod > > > > + > > +=item find_in_log(node, pattern, offset) > > + > > +Find pattern in logfile of node after offset byte. > > + > > +=cut > > + > > +sub find_in_log > > +{ > > + my ($node, $pattern, $offset) = @_; > > + > > + $offset = 0 unless defined $offset; > > + my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile); > > Since this function is in the same package, there's no need to qualify > it with the full name. I know the callers you copied it from did, but > they wouldn't have had to either, since it's exported by default (in the > @EXPORT array above), unless the use statement has an explicit argument > list that excludes it. I have moved this function to Cluster.pm file now, since it is moved, I had to qualify the name with the full name. > > + return 0 if (length($log) <= 0 || length($log) <= $offset); > > + > > + $log = substr($log, $offset); > > Also, the existing callers don't seem to have got the memo that > slurp_file() takes an optinal offset parameter, which will cause it to > seek to that postion before slurping the file, which is more efficient > than reading the whole file in and substr-ing it. There's not much > point in the length checks either, since regex-matching against an empty > string is very cheap (and if the provide pattern can match the empty > string the whole function call is rather pointless). > > > + return $log =~ m/$pattern/; > > +} > > All in all, it could be simplified to: > > sub find_in_log { > my ($node, $pattern, $offset) = @_; > > return slurp_file($node->logfile, $offset) =~ $pattern; > } Modified in similar lines > However, none of the other functions in ::Utils know anything about node > objects, which makes me think it should be a method on the node itself > (i.e. in PostgreSQL::Test::Cluster) instead. Also, I think log_contains > would be a better name, since it just returns a boolean. The name > find_in_log makes me think it would return the log lines matching the > pattern, or the position of the match in the file. Modified > In that case, the slurp_file() call would have to be fully qualified, > since ::Cluster uses an empty import list to avoid polluting the method > namespace with imported functions. Modified. Thanks for the comments, the attached v2 version patch has the changes for the same. Regards, Vignesh
Вложения
On Fri, 26 May 2023 at 04:09, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, May 25, 2023 at 06:34:20PM +0100, Dagfinn Ilmari Mannsåker wrote: > > However, none of the other functions in ::Utils know anything about node > > objects, which makes me think it should be a method on the node itself > > (i.e. in PostgreSQL::Test::Cluster) instead. Also, I think log_contains > > would be a better name, since it just returns a boolean. The name > > find_in_log makes me think it would return the log lines matching the > > pattern, or the position of the match in the file. > > > > In that case, the slurp_file() call would have to be fully qualified, > > since ::Cluster uses an empty import list to avoid polluting the method > > namespace with imported functions. > > Hmm. connect_ok() and connect_fails() in Cluster.pm have a similar > log comparison logic, feeding from the offset of a log file. Couldn't > you use the same code across the board for everything? Note that this > stuff is parameterized so as it is possible to check if patterns match > or do not match, for multiple patterns. It seems to me that we could > use the new log finding routine there as well, so how about extending > it a bit more? You would need, at least: > - One parameter for log entries matching. > - One parameter for log entries not matching. I felt adding these to log_contains was making the function slightly complex with multiple checks. I was not able to make it simple with the approach I tried. How about having a common function check_connect_log_contents which has the common log contents check for connect_ok and connect_fails function like the v2-0002 patch attached. Regards, Vignesh
Вложения
On 2023-05-26 Fr 20:35, vignesh C wrote:
On Fri, 26 May 2023 at 04:09, Michael Paquier <michael@paquier.xyz> wrote:On Thu, May 25, 2023 at 06:34:20PM +0100, Dagfinn Ilmari Mannsåker wrote:However, none of the other functions in ::Utils know anything about node objects, which makes me think it should be a method on the node itself (i.e. in PostgreSQL::Test::Cluster) instead. Also, I think log_contains would be a better name, since it just returns a boolean. The name find_in_log makes me think it would return the log lines matching the pattern, or the position of the match in the file. In that case, the slurp_file() call would have to be fully qualified, since ::Cluster uses an empty import list to avoid polluting the method namespace with imported functions.Hmm. connect_ok() and connect_fails() in Cluster.pm have a similar log comparison logic, feeding from the offset of a log file. Couldn't you use the same code across the board for everything? Note that this stuff is parameterized so as it is possible to check if patterns match or do not match, for multiple patterns. It seems to me that we could use the new log finding routine there as well, so how about extending it a bit more? You would need, at least: - One parameter for log entries matching. - One parameter for log entries not matching.I felt adding these to log_contains was making the function slightly complex with multiple checks. I was not able to make it simple with the approach I tried. How about having a common function check_connect_log_contents which has the common log contents check for connect_ok and connect_fails function like the v2-0002 patch attached.
+ $offset = 0 unless defined $offset;
This is unnecessary, as slurp_file() handles it appropriately, and in fact doing this is slightly inefficient, as it will cause slurp_file to do a redundant seek.
FYI there's a simpler way to say it if we wanted to:
$offset //= 0;
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Sat, 27 May 2023 at 17:32, Andrew Dunstan <andrew@dunslane.net> wrote: > > > On 2023-05-26 Fr 20:35, vignesh C wrote: > > On Fri, 26 May 2023 at 04:09, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, May 25, 2023 at 06:34:20PM +0100, Dagfinn Ilmari Mannsåker wrote: > > However, none of the other functions in ::Utils know anything about node > objects, which makes me think it should be a method on the node itself > (i.e. in PostgreSQL::Test::Cluster) instead. Also, I think log_contains > would be a better name, since it just returns a boolean. The name > find_in_log makes me think it would return the log lines matching the > pattern, or the position of the match in the file. > > In that case, the slurp_file() call would have to be fully qualified, > since ::Cluster uses an empty import list to avoid polluting the method > namespace with imported functions. > > Hmm. connect_ok() and connect_fails() in Cluster.pm have a similar > log comparison logic, feeding from the offset of a log file. Couldn't > you use the same code across the board for everything? Note that this > stuff is parameterized so as it is possible to check if patterns match > or do not match, for multiple patterns. It seems to me that we could > use the new log finding routine there as well, so how about extending > it a bit more? You would need, at least: > - One parameter for log entries matching. > - One parameter for log entries not matching. > > I felt adding these to log_contains was making the function slightly > complex with multiple checks. I was not able to make it simple with > the approach I tried. How about having a common function > check_connect_log_contents which has the common log contents check for > connect_ok and connect_fails function like the v2-0002 patch attached. > > > + $offset = 0 unless defined $offset; > > > This is unnecessary, as slurp_file() handles it appropriately, and in fact doing this is slightly inefficient, as it willcause slurp_file to do a redundant seek. > > FYI there's a simpler way to say it if we wanted to: > > $offset //= 0; Thanks for the comment, the attached v3 version patch has the changes for the same. Regards, Vignesh
Вложения
On Mon, May 29, 2023 at 07:49:52AM +0530, vignesh C wrote: > Thanks for the comment, the attached v3 version patch has the changes > for the same. -if (find_in_log( - $node, $log_offset, - qr/peer authentication is not supported on this platform/)) +if ($node->log_contains( + qr/peer authentication is not supported on this platform/), + $log_offset,) This looks like a typo to me, the log offset is eaten. Except of that, I am on board with log_contains(). There are two things that bugged me with the refactoring related to connect_ok and connect_fails: - check_connect_log_contents() is a name too complicated. While looking at that I have settled to a simpler log_check(), as we could perfectly use that for something else than connections as long as we want to check multiple patterns at once. - The refactoring of the documentation for the routines of Cluster.pm became incorrect. For example, the patch does not list anymore log_like and log_unlike for connect_ok. With all that in mind I have hacked a few adjustments in a 0003, though I agree with the separation between 0001 and 0002. 033_replay_tsp_drops and 019_replslot_limit are not new to v16, but 003_peer.pl and 035_standby_logical_decoding.pl, making the number of places where find_in_log() exists twice as much. So I would be tempted to refactor these tests in v16. Perhaps anybody from the RMT could comment? We've usually been quite flexible with the tests even in beta. Thoughts? -- Michael
Вложения
On Sun, 4 Jun 2023 at 03:51, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, May 29, 2023 at 07:49:52AM +0530, vignesh C wrote: > > Thanks for the comment, the attached v3 version patch has the changes > > for the same. > > -if (find_in_log( > - $node, $log_offset, > - qr/peer authentication is not supported on this platform/)) > +if ($node->log_contains( > + qr/peer authentication is not supported on this platform/), > + $log_offset,) > > This looks like a typo to me, the log offset is eaten. > > Except of that, I am on board with log_contains(). Thanks for fixing this. > There are two things that bugged me with the refactoring related to > connect_ok and connect_fails: > - check_connect_log_contents() is a name too complicated. While > looking at that I have settled to a simpler log_check(), as we could > perfectly use that for something else than connections as long as we > want to check multiple patterns at once. > - The refactoring of the documentation for the routines of Cluster.pm > became incorrect. For example, the patch does not list anymore > log_like and log_unlike for connect_ok. This new name suggested by you looks simpler, your documentation of having it in connect_ok and connect_fails and referring it to log_check makes it more clearer. Regards, Vignesh
On Sun, Jun 4, 2023 at 3:51 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, May 29, 2023 at 07:49:52AM +0530, vignesh C wrote: > > Thanks for the comment, the attached v3 version patch has the changes > > for the same. > > -if (find_in_log( > - $node, $log_offset, > - qr/peer authentication is not supported on this platform/)) > +if ($node->log_contains( > + qr/peer authentication is not supported on this platform/), > + $log_offset,) > > This looks like a typo to me, the log offset is eaten. > > Except of that, I am on board with log_contains(). > > There are two things that bugged me with the refactoring related to > connect_ok and connect_fails: > - check_connect_log_contents() is a name too complicated. While > looking at that I have settled to a simpler log_check(), as we could > perfectly use that for something else than connections as long as we > want to check multiple patterns at once. > - The refactoring of the documentation for the routines of Cluster.pm > became incorrect. For example, the patch does not list anymore > log_like and log_unlike for connect_ok. > > With all that in mind I have hacked a few adjustments in a 0003, > though I agree with the separation between 0001 and 0002. > > 033_replay_tsp_drops and 019_replslot_limit are not new to v16, but > 003_peer.pl and 035_standby_logical_decoding.pl, making the number of > places where find_in_log() exists twice as much. So I would be > tempted to refactor these tests in v16. Perhaps anybody from the RMT > could comment? We've usually been quite flexible with the tests even > in beta. > Personally, I don't see any problem to do this refactoring for v16. However, sometimes, we do decide to backpatch refactoring in tests to avoid backpatch effort. I am not completely sure if that is the case here. -- With Regards, Amit Kapila.
On Tue, Jun 06, 2023 at 08:05:49AM +0530, Amit Kapila wrote: > Personally, I don't see any problem to do this refactoring for v16. > However, sometimes, we do decide to backpatch refactoring in tests to > avoid backpatch effort. I am not completely sure if that is the case > here. 033_replay_tsp_drops.pl has one find_in_log() down to 11, and 019_replslot_limit.pl has four calls down to 14. Making things consistent everywhere is a rather appealing argument to ease future backpatching. So I am OK to spend a few extra cycles in adjusting these routines all the way down where needed. I'll do that tomorrow once I get back in front of my laptop. Note that connect_ok() and connect_fails() are new to 14, so this part has no need to go further down than that. -- Michael
Вложения
On Mon, Jun 5, 2023 at 9:39 PM vignesh C <vignesh21@gmail.com> wrote: > > On Sun, 4 Jun 2023 at 03:51, Michael Paquier <michael@paquier.xyz> wrote: > > > > This looks like a typo to me, the log offset is eaten. > > > > Except of that, I am on board with log_contains(). > > Thanks for fixing this. +1 for deduplicating find_in_log. How about deduplicating advance_wal too so that 019_replslot_limit.pl, 033_replay_tsp_drops.pl, 035_standby_logical_decoding.pl and 001_stream_rep.pl can benefit immediately? FWIW, a previous discussion related to this is here https://www.postgresql.org/message-id/flat/CALj2ACVUcXtLgHRPbx28ZQQyRM6j%2BeSH3jNUALr2pJ4%2Bf%3DHRGA%40mail.gmail.com. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Tue, Jun 06, 2023 at 10:00:00AM +0530, Bharath Rupireddy wrote: > +1 for deduplicating find_in_log. How about deduplicating advance_wal > too so that 019_replslot_limit.pl, 033_replay_tsp_drops.pl, > 035_standby_logical_decoding.pl and 001_stream_rep.pl can benefit > immediately? As in a small wrapper for pg_switch_wal() that generates N segments at will? I don't see why we could not do that if it proves useful in the long run. -- Michael
Вложения
On Tue, Jun 6, 2023 at 4:11 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jun 06, 2023 at 10:00:00AM +0530, Bharath Rupireddy wrote: > > +1 for deduplicating find_in_log. How about deduplicating advance_wal > > too so that 019_replslot_limit.pl, 033_replay_tsp_drops.pl, > > 035_standby_logical_decoding.pl and 001_stream_rep.pl can benefit > > immediately? > > As in a small wrapper for pg_switch_wal() that generates N segments at > will? Yes. A simpler way of doing it would be to move advance_wal() in 019_replslot_limit.pl to Cluster.pm, something like the attached. CI members don't complain with it https://github.com/BRupireddy/postgres/tree/add_a_function_in_Cluster.pm_to_generate_WAL. Perhaps, we can name it better instead of advance_wal, say generate_wal or some other? > I don't see why we could not do that if it proves useful in the > long run. Besides the beneficiaries listed above, the test case added by https://commitfest.postgresql.org/43/3663/ can use it. And, the test_table bits in 020_pg_receivewal.pl can use it (?). -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Вложения
On Tue, 6 Jun 2023 at 09:36, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jun 06, 2023 at 08:05:49AM +0530, Amit Kapila wrote: > > Personally, I don't see any problem to do this refactoring for v16. > > However, sometimes, we do decide to backpatch refactoring in tests to > > avoid backpatch effort. I am not completely sure if that is the case > > here. > > 033_replay_tsp_drops.pl has one find_in_log() down to 11, and > 019_replslot_limit.pl has four calls down to 14. Making things > consistent everywhere is a rather appealing argument to ease future > backpatching. So I am OK to spend a few extra cycles in adjusting > these routines all the way down where needed. I'll do that tomorrow > once I get back in front of my laptop. > > Note that connect_ok() and connect_fails() are new to 14, so this > part has no need to go further down than that. Please find the attached patches that can be applied on back branches too. v5*master.patch can be applied on master, v5*PG15.patch can be applied on PG15, v5*PG14.patch can be applied on PG14, v5*PG13.patch can be applied on PG13, v5*PG12.patch can be applied on PG12, PG11 and PG10. Regards, Vignesh
Вложения
- v5-0001-Remove-duplicate-find_in_log-sub-routines-from-ta_PG13.patch
- v5-0001-Have-find_in_log-sub-routine-as-a-common-routine_PG12.patch
- v5-0001-Remove-duplicate-find_in_log-sub-routines-from-ta_PG14.patch
- v5-0001-Remove-duplicate-find_in_log-sub-routines-from-ta_master.patch
- v5-0001-Remove-duplicate-find_in_log-sub-routines-from-ta_PG15.patch
- v5-0002-Move-common-connection-log-content-verification-c_master.patch
- v5-0002-Move-common-connection-log-content-verification-c_PG14.patch
- v5-0002-Move-common-connection-log-content-verification-c_PG15.patch
On Tue, Jun 06, 2023 at 05:53:40PM +0530, Bharath Rupireddy wrote: > Yes. A simpler way of doing it would be to move advance_wal() in > 019_replslot_limit.pl to Cluster.pm, something like the attached. CI > members don't complain with it > https://github.com/BRupireddy/postgres/tree/add_a_function_in_Cluster.pm_to_generate_WAL. > Perhaps, we can name it better instead of advance_wal, say > generate_wal or some other? Why not discussing that on a separate thread? What you are proposing is independent of what Vignesh has proposed. Note that the patch format is octet-stream, causing extra CRs to exist in the patch. Something happened on your side when you sent your patch, I guess? -- Michael
Вложения
On Tue, Jun 06, 2023 at 06:43:44PM +0530, vignesh C wrote: > Please find the attached patches that can be applied on back branches > too. v5*master.patch can be applied on master, v5*PG15.patch can be > applied on PG15, v5*PG14.patch can be applied on PG14, v5*PG13.patch > can be applied on PG13, v5*PG12.patch can be applied on PG12, PG11 and > PG10. Thanks. The amount of minimal conflicts across all these branches is always fun to play with. I have finally got around and applied all that, after doing a proper split, applying one part down to 14 and the second back to 11. -- Michael
Вложения
On Fri, Jun 9, 2023 at 8:29 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jun 06, 2023 at 05:53:40PM +0530, Bharath Rupireddy wrote: > > Yes. A simpler way of doing it would be to move advance_wal() in > > 019_replslot_limit.pl to Cluster.pm, something like the attached. CI > > members don't complain with it > > https://github.com/BRupireddy/postgres/tree/add_a_function_in_Cluster.pm_to_generate_WAL. > > Perhaps, we can name it better instead of advance_wal, say > > generate_wal or some other? > > Why not discussing that on a separate thread? What you are proposing > is independent of what Vignesh has proposed. Sure. Here it is - https://www.postgresql.org/message-id/CALj2ACU3R8QFCvDewHCMKjgb2w_-CMCyd6DAK%3DJb-af14da5eg%40mail.gmail.com. > Note that the patch > format is octet-stream, causing extra CRs to exist in the patch. > Something happened on your side when you sent your patch, I guess? Had to attach the patch in .txt format to not block Vignesh's patch from testing by CF Bot (if at all this thread was added there). -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, 9 Jun 2023 at 08:31, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jun 06, 2023 at 06:43:44PM +0530, vignesh C wrote: > > Please find the attached patches that can be applied on back branches > > too. v5*master.patch can be applied on master, v5*PG15.patch can be > > applied on PG15, v5*PG14.patch can be applied on PG14, v5*PG13.patch > > can be applied on PG13, v5*PG12.patch can be applied on PG12, PG11 and > > PG10. > > Thanks. The amount of minimal conflicts across all these branches is > always fun to play with. I have finally got around and applied all > that, after doing a proper split, applying one part down to 14 and the > second back to 11. Thanks for pushing this. Regards, Vignesh