Обсуждение: [PATCH] Add tests for Bitmapset
Hello, I noticed that there are no tests for Bitmapset in src/test/modules as is the case for other similar things like radixtree, rbtree, etc. so I created one. I realize that Bitmapset is already "tested" by all the other code that uses it, but I was able to find one minor oversight[1] in that code with these new tests. I hope I've covered all the bases, but if you have thoughts on other ways to test Bitmapset I'll happily add them to the patch. best. -greg [1] https://www.postgresql.org/message-id/2000A717-1FFE-4031-827B-9330FB2E9065%40getmailspring.com
Вложения
On Fri, Aug 15, 2025 at 11:39:23AM -0400, Greg Burd wrote: > I noticed that there are no tests for Bitmapset in src/test/modules as > is the case for other similar things like radixtree, rbtree, etc. so I > created one. I realize that Bitmapset is already "tested" by all the > other code that uses it, but I was able to find one minor oversight[1] > in that code with these new tests. > > I hope I've covered all the bases, but if you have thoughts on other > ways to test Bitmapset I'll happily add them to the patch. Adding some tests here seems like a good idea. I might look into some ways to trim it down a bit, but that'd just be minor editorialization. One other thing to consider is adding randomness to the tests (see test_radixtree and test_binaryheap for examples). -- nathan
> On Sep 4, 2025, at 10:00 PM, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Fri, Aug 15, 2025 at 11:39:23AM -0400, Greg Burd wrote: >> I noticed that there are no tests for Bitmapset in src/test/modules as >> is the case for other similar things like radixtree, rbtree, etc. so I >> created one. I realize that Bitmapset is already "tested" by all the >> other code that uses it, but I was able to find one minor oversight[1] >> in that code with these new tests. >> >> I hope I've covered all the bases, but if you have thoughts on other >> ways to test Bitmapset I'll happily add them to the patch. > > Adding some tests here seems like a good idea. I might look into some ways > to trim it down a bit, but that'd just be minor editorialization. One > other thing to consider is adding randomness to the tests (see > test_radixtree and test_binaryheap for examples). Nathan, Thank you for your interest in this patch, I appreciate that your time is limited and highly valuable to the community. This patch isn't "earth shattering", but I think it's valuable to have test coverage even in cases where the code being tested is already very well exercised. I looked at both radix tree and binary heap and how they use random sets when testing. Binary heap uses it to create different random sets of numbers to use across multiple tests while radix tree has a single function that focuses on randomized data. I decided not to add randomization into the tests of Bitmapset simply because I like avoiding non-deterministic behavior. But in tests I guess that can be helpful finding future unknown corner cases. I'm on the fence as to the value, your call. :) Let me know if you'd like that or not. > > -- > Nathan best, and thanks again for the attention, -greg
On Fri, Sep 05, 2025 at 10:48:21AM -0400, Burd, Greg wrote: > I looked at both radix tree and binary heap and how they use random sets when > testing. Binary heap uses it to create different random sets of numbers to > use across multiple tests while radix tree has a single function that focuses > on randomized data. I decided not to add randomization into the tests of > Bitmapset simply because I like avoiding non-deterministic behavior. But in > tests I guess that can be helpful finding future unknown corner cases. I'm > on the fence as to the value, your call. :) I'm not too concerned about it. We've lived without a dedicated test suite for Bitmapset for a very long time, so any amount of test coverage is an improvement. Like you said, adding some randomization might be helpful for finding weird bugs we wouldn't have thought to test. And, given the many, many machines that run the tests, IMHO it'd only help build even more confidence in the code. If my suggestion inspires you to update the patch, great, but I'm fine with proceeding with what you already wrote, too. -- nathan
> On Sep 5, 2025, at 2:43 PM, Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Fri, Sep 05, 2025 at 10:48:21AM -0400, Burd, Greg wrote: >> I looked at both radix tree and binary heap and how they use random sets when >> testing. Binary heap uses it to create different random sets of numbers to >> use across multiple tests while radix tree has a single function that focuses >> on randomized data. I decided not to add randomization into the tests of >> Bitmapset simply because I like avoiding non-deterministic behavior. But in >> tests I guess that can be helpful finding future unknown corner cases. I'm >> on the fence as to the value, your call. :) > > I'm not too concerned about it. We've lived without a dedicated test suite > for Bitmapset for a very long time, so any amount of test coverage is an > improvement. Like you said, adding some randomization might be helpful for > finding weird bugs we wouldn't have thought to test. And, given the many, > many machines that run the tests, IMHO it'd only help build even more > confidence in the code. If my suggestion inspires you to update the patch, > great, but I'm fine with proceeding with what you already wrote, too. Nathan, thanks for considering the patch. Honestly, I'm fine with it as is. We can revisit later if needed. This does what I'd intended, test and document in code the API and implementation making future changes to that more transparent. Thanks! -greg -- > nathan
On Mon, Sep 8, 2025 at 11:21 AM Burd, Greg <greg@burd.me> wrote: > > > > On Sep 5, 2025, at 2:43 PM, Nathan Bossart <nathandbossart@gmail.com> wrote: > > > > On Fri, Sep 05, 2025 at 10:48:21AM -0400, Burd, Greg wrote: > >> I looked at both radix tree and binary heap and how they use random sets when > >> testing. Binary heap uses it to create different random sets of numbers to > >> use across multiple tests while radix tree has a single function that focuses > >> on randomized data. I decided not to add randomization into the tests of > >> Bitmapset simply because I like avoiding non-deterministic behavior. But in > >> tests I guess that can be helpful finding future unknown corner cases. I'm > >> on the fence as to the value, your call. :) > > > > I'm not too concerned about it. We've lived without a dedicated test suite > > for Bitmapset for a very long time, so any amount of test coverage is an > > improvement. Like you said, adding some randomization might be helpful for > > finding weird bugs we wouldn't have thought to test. And, given the many, > > many machines that run the tests, IMHO it'd only help build even more > > confidence in the code. If my suggestion inspires you to update the patch, > > great, but I'm fine with proceeding with what you already wrote, too. > > Nathan, thanks for considering the patch. Honestly, I'm fine with it as is. > We can revisit later if needed. This does what I'd intended, test and document > in code the API and implementation making future changes to that more > transparent. I appreciate your work on this. While I agree that adding more tests to bitmapset.c is a good idea, I'm concerned about the minimal improvement in test coverage despite the addition of new test cases (only three lines of code are newly covered). Apart from adding some randomness to the tests we've discussed, given that we're implementing a dedicated test module for bitmapset.c, I would expect to see a more increase in test coverage. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Sep 11, 2025, at 2:48 AM, Masahiko Sawada wrote: > On Mon, Sep 8, 2025 at 11:21 AM Burd, Greg <greg@burd.me> wrote: >> >> >> > On Sep 5, 2025, at 2:43 PM, Nathan Bossart <nathandbossart@gmail.com> wrote: >> > >> > On Fri, Sep 05, 2025 at 10:48:21AM -0400, Burd, Greg wrote: >> >> I looked at both radix tree and binary heap and how they use random sets when >> >> testing. Binary heap uses it to create different random sets of numbers to >> >> use across multiple tests while radix tree has a single function that focuses >> >> on randomized data. I decided not to add randomization into the tests of >> >> Bitmapset simply because I like avoiding non-deterministic behavior. But in >> >> tests I guess that can be helpful finding future unknown corner cases. I'm >> >> on the fence as to the value, your call. :) >> > >> > I'm not too concerned about it. We've lived without a dedicated test suite >> > for Bitmapset for a very long time, so any amount of test coverage is an >> > improvement. Like you said, adding some randomization might be helpful for >> > finding weird bugs we wouldn't have thought to test. And, given the many, >> > many machines that run the tests, IMHO it'd only help build even more >> > confidence in the code. If my suggestion inspires you to update the patch, >> > great, but I'm fine with proceeding with what you already wrote, too. >> >> Nathan, thanks for considering the patch. Honestly, I'm fine with it as is. >> We can revisit later if needed. This does what I'd intended, test and document >> in code the API and implementation making future changes to that more >> transparent. > > I appreciate your work on this. While I agree that adding more tests > to bitmapset.c is a good idea, I'm concerned about the minimal > improvement in test coverage despite the addition of new test cases > (only three lines of code are newly covered). Apart from adding some > randomness to the tests we've discussed, given that we're implementing > a dedicated test module for bitmapset.c, I would expect to see a more > increase in test coverage. Good point and thanks for taking the time to reply. The only thing it identified was a small issue fixed in b463288 so I'd not expect coverage to increase much. I'll take a stab at adding some randomness to the tests and check the delta in coverage. Thanks for the nudge. :) > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com Just for reference I started this not to increase coverage, which is a good goal just not the one I had. I was reviewing the API and considering some changes based on other work I've done. Now that I see how deeply baked in this code is I think that's unlikely. Maybe something else distinct for bitmaps over 64-bit space at some point will be useful. I wrote this code just to capture the API in test form. best, -greg
On Thu, Sep 11, 2025 at 06:56:07AM -0400, Greg Burd wrote: > Just for reference I started this not to increase coverage, which is a good > goal just not the one I had. I was reviewing the API and considering some > changes based on other work I've done. Now that I see how deeply baked in > this code is I think that's unlikely. Maybe something else distinct for > bitmaps over 64-bit space at some point will be useful. I wrote this code > just to capture the API in test form. How much does this measure in terms of numbers produced by coverage-html (see [1])? The paths taken don't always matter as it can also be important to check combinations of code paths that are taken by other tests when checking after edge cases, but that would give an idea of gain vs extra runtime. Not objecting to your patch, just being curious as I am not seeing any numbers posted on this thread. [1]: https://www.postgresql.org/docs/devel/regress-coverage.html -- Michael
Вложения
On Sep 11, 2025, at 9:36 PM, Michael Paquier <michael@paquier.xyz> wrote:On Thu, Sep 11, 2025 at 06:56:07AM -0400, Greg Burd wrote:Just for reference I started this not to increase coverage, which is a goodgoal just not the one I had. I was reviewing the API and considering somechanges based on other work I've done. Now that I see how deeply baked inthis code is I think that's unlikely. Maybe something else distinct forbitmaps over 64-bit space at some point will be useful. I wrote this codejust to capture the API in test form.How much does this measure in terms of numbers produced bycoverage-html (see [1])? The paths taken don't always matter as itcan also be important to check combinations of code paths that aretaken by other tests when checking after edge cases, but that wouldgive an idea of gain vs extra runtime. Not objecting to your patch,just being curious as I am not seeing any numbers posted on thisthread.[1]: https://www.postgresql.org/docs/devel/regress-coverage.html--Michael
Sawada-san, Michael,
Thank you both for the push to measure. Before the patch as it stands now the
coverage for src/backend/nodes/bitmapset.c is 63.5% and after it is 66.5%. Not
an amazing difference, but something. I guess I expected this to be higher given
the degree to which this datatype is used.
I'll review the gaps in coverage and update the tests. I'll look for a way to add
meaningful randomization.
-greg
On Sep 13 2025, at 10:23 am, Greg Burd <greg@burd.me> wrote: > > Sawada-san, Michael, > > Thank you both for the push to measure. Before the patch as it stands > now the > coverage for src/backend/nodes/bitmapset.c is 63.5% and after it is > 66.5%. Not > an amazing difference, but something. I guess I expected this to be > higher given > the degree to which this datatype is used. > > I'll review the gaps in coverage and update the tests. I'll look for > a way to add meaningful randomization. > > -greg Hello hackers, Here's the progress I've made in coverage for testing Bitmapset. coverage: HEAD v1 v2 lines......: 87.1 87.7 91.3 functions..: 100 100 100 branches...: 63.5 66.5 77.9 I've also added a function that performs a variety of operations using random data. For reference radixtree has: coverage: HEAD lines......: 98.3 functions..: 97.2 branches...: 89.4 best, -greg
Вложения
On Mon, Sep 15, 2025 at 2:00 PM Greg Burd <greg@burd.me> wrote: > For reference radixtree has: > > coverage: HEAD > lines......: 98.3 > functions..: 97.2 > branches...: 89.4 + /* Test negative member in bms_make_singleton */ + error_caught = false; + PG_TRY(); + { + bms_make_singleton(-1); + } + PG_CATCH(); + { + error_caught = true; + FlushErrorState(); + } + PG_END_TRY(); + EXPECT_TRUE(error_caught); This is an anti-pattern for PostgreSQL code. You can't just flush an error without aborting a transaction or subtransaction to recover. Even if it could be shown that this were harmless here, I think it's a terrible idea to have code like this in the tree, as it encourages people to do exactly the wrong thing. But backing up a step, this also doesn't really seem like the right way to test the error conditions. It deliberately throws away the error message. All this verifies is that you caught an error. If you let the error escape to the client you could have the expected output test that you got the expected message. I think it would be a better idea to structure this as a set of SQL-callable functions and move a bunch of the logic into SQL. -- Robert Haas EDB: http://www.enterprisedb.com
On Sep 15 2025, at 2:54 pm, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Sep 15, 2025 at 2:00 PM Greg Burd <greg@burd.me> wrote: >> For reference radixtree has: >> >> coverage: HEAD >> lines......: 98.3 >> functions..: 97.2 >> branches...: 89.4 > > + /* Test negative member in bms_make_singleton */ > + error_caught = false; > + PG_TRY(); > + { > + bms_make_singleton(-1); > + } > + PG_CATCH(); > + { > + error_caught = true; > + FlushErrorState(); > + } > + PG_END_TRY(); > + EXPECT_TRUE(error_caught); > > This is an anti-pattern for PostgreSQL code. You can't just flush an > error without aborting a transaction or subtransaction to recover. > Even if it could be shown that this were harmless here, I think it's a > terrible idea to have code like this in the tree, as it encourages > people to do exactly the wrong thing. Fair enough, I'll rework it. > But backing up a step, this also doesn't really seem like the right > way to test the error conditions. It deliberately throws away the > error message. All this verifies is that you caught an error. If you > let the error escape to the client you could have the expected output > test that you got the expected message. > > I think it would be a better idea to structure this as a set of > SQL-callable functions and move a bunch of the logic into SQL. I'll give that approach a try, thanks for the suggestion. -greg > -- > Robert Haas > EDB: http://www.enterprisedb.com
On Sep 15 2025, at 3:03 pm, Greg Burd <greg@burd.me> wrote: > On Sep 15 2025, at 2:54 pm, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Mon, Sep 15, 2025 at 2:00 PM Greg Burd <greg@burd.me> wrote: >>> For reference radixtree has: >>> >>> coverage: HEAD >>> lines......: 98.3 >>> functions..: 97.2 >>> branches...: 89.4 >> >> + /* Test negative member in bms_make_singleton */ >> + error_caught = false; >> + PG_TRY(); >> + { >> + bms_make_singleton(-1); >> + } >> + PG_CATCH(); >> + { >> + error_caught = true; >> + FlushErrorState(); >> + } >> + PG_END_TRY(); >> + EXPECT_TRUE(error_caught); >> >> This is an anti-pattern for PostgreSQL code. You can't just flush an >> error without aborting a transaction or subtransaction to recover. >> Even if it could be shown that this were harmless here, I think it's a >> terrible idea to have code like this in the tree, as it encourages >> people to do exactly the wrong thing. > > Fair enough, I'll rework it. > >> But backing up a step, this also doesn't really seem like the right >> way to test the error conditions. It deliberately throws away the >> error message. All this verifies is that you caught an error. If you >> let the error escape to the client you could have the expected output >> test that you got the expected message. >> >> I think it would be a better idea to structure this as a set of >> SQL-callable functions and move a bunch of the logic into SQL. > > I'll give that approach a try, thanks for the suggestion. > > -greg Robert, Reworked as indicated, thanks for pointing out the anti-pattern I'd missed. best. -greg >> -- >> Robert Haas >> EDB: http://www.enterprisedb.com
Вложения
On Mon, Sep 15, 2025 at 03:56:30PM -0400, Greg Burd wrote: > Reworked as indicated, thanks for pointing out the anti-pattern I'd > missed. Hmm. Should we try to get something closer in shape to what we do for the SLRU tests, with one SQL function mapping to each C function we are testing? This counts for bms_is_member(), bms_num_members(), add, del, mul, hash and make_singleton. +test_bms_del_member_negative(PG_FUNCTION_ARGS) +{ + /* This should throw an error for negative member */ + bms_del_member(NULL, -20); + PG_RETURN_VOID(); For example, this case could use a Bitmapset and a number (or an array of numbers for repeated operations), with NULL being one option for the input Bitmapset. This makes the tests a bit more representative of what they do at SQL level, hence there would be no need to double-check a given line where a failure happens. The Bitmapset could then be passed around as bytea blobs using \gset, for example. This should not be a problem as long as they are palloc'd in the TopMemoryContext. The SQL output for true/false/NULL/non-NULL generated as output of the test could then be used instead of the EXPECT_*() macros reported then in the elogs. Perhaps my view of things is too cute and artistic, but when these test suites grow over time and differ across branches, having to dig into a specific line of a specific branch to get what a problem is about is more error-prone. Particularly annoying when calling one single function that does a lot of various actions, like the proposed test_bitmapset(). Side note: `git diff --check` is reporting a bunch of indents with spaces around the EXPECT macros. -- Michael
Вложения
On Tue, Sep 16, 2025 at 2:04 AM Michael Paquier <michael@paquier.xyz> wrote: > one SQL function mapping to each C function we are testing? Yes, I think we should do this, if possible. -- Robert Haas EDB: http://www.enterprisedb.com
On Sep 16 2025, at 8:02 am, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 16, 2025 at 2:04 AM Michael Paquier <michael@paquier.xyz> wrote: >> one SQL function mapping to each C function we are testing? > > Yes, I think we should do this, if possible. Michael, Robert, Thanks for your time reviewing the proposed code and for providing feedback. This patch started life as simply copy of the test_radixtree module which uses a single test function triggered in SQL, hence my approach. I guess I should have copied the test_slru module instead! :) I see the value in the idea of splitting it up into separate functions and I'll give that a try. Michael, apologies for the white space issues. I'll clean that up, and thank you for the `git diff --check` tip, I'll add that to my toolbox/routine. > -- > Robert Haas > EDB: http://www.enterprisedb.com best. -greg
I've re-written the set of tests as suggested (I hope!). I've not re-run the coverage report (yet) to ensure we're at least as well covered as v3, but attached is v4 for your inspection (amusement?). I've tried to author the SQL tests such that failures clearly indicate what's at gone wrong. This exercise turned into a lot more LOC than I'd expected, I hope it provides some value to the community for testing this important data structure and helps to codify the API clearly. best. -greg
Вложения
On Tue, Sep 16, 2025 at 12:04 PM Greg Burd <greg@burd.me> wrote: > > I've re-written the set of tests as suggested (I hope!). I've not > re-run the coverage report (yet) to ensure we're at least as well > covered as v3, but attached is v4 for your inspection (amusement?). > I've tried to author the SQL tests such that failures clearly indicate > what's at gone wrong. > > This exercise turned into a lot more LOC than I'd expected, I hope it > provides some value to the community for testing this important data > structure and helps to codify the API clearly. Thank you for updating the patch. It seems cfbot caught a regression test error[1] in a 32-bit build. Regards, [1] https://cirrus-ci.com/task/5290864655728640 -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Tue, Sep 16, 2025 at 03:00:40PM -0700, Masahiko Sawada wrote: > Thank you for updating the patch. It seems cfbot caught a regression > test error[1] in a 32-bit build. - bitmap_hash [1,3,5] | 49870778 + bitmap_hash [1,3,5] | 1509752520 This one is able the hash value computation being not portable across architectures. I would just change these to be non-NULL, I guess. -- Michael
Вложения
On Tue, Sep 16, 2025 at 03:04:39PM -0400, Greg Burd wrote: > I've re-written the set of tests as suggested (I hope!). I've not > re-run the coverage report (yet) to ensure we're at least as well > covered as v3, but attached is v4 for your inspection (amusement?). > I've tried to author the SQL tests such that failures clearly indicate > what's at gone wrong. Thanks for the update, that's easier to follow, even if it's more code overall. It looks like all the 29 APIs are present, plus the extras for the array mapping routines required for the tests with the list APIs. +static void +elog_bitmapset(int elevel, const char *label, const Bitmapset *bms) This routine, that's able to translate a bytea into a readable form, is also something that I would have added as an extra function, then used it in some of the tests where I would care about the bitmap in output generated after a given operation, removing the DEBUG parts that I guess have also a risk of rotting over time. That would make the tests for simple operations that generate a bms as a result in easier to read, because we would call less SQL calls. For example, take the "add_member idempotent" case, that calls twice test_bms_make_singleton() to compare the output byteas, we could just do a test_bms_add_member(test_bms_make_singleton(10), 10) and show the output? We could perhaps use the CTE style for the basic calls, to reduce the test_bms_make_singleton() calls. What you are doing feels OK as well, just a thought in passing. FWIW, I tend to write the regression test queries so as these are limited to 72-80 characters per line, to fit even in a small terminals. That's just an old-school take, even if it means more lines in the SQL input file.. Perhaps it would be worth documenting what's the value of test_random_operations? It is a mean of testing add, del and is_member with a modulo 1000 of the member number. With the other functions that offer deterministic tests, I am not sure what's the extra value gained in it. + bms1_data = PG_GETARG_BYTEA_PP(0); + if (VARSIZE_ANY_EXHDR(bms1_data) > 0) + { + bms1 = (Bitmapset *) palloc(VARSIZE_ANY_EXHDR(bms1_data)); + memcpy(bms1, VARDATA_ANY(bms1_data), VARSIZE_ANY_EXHDR(bms1_data)); + } This pattern is repeated 9 times, if my fingers got it right. Perhaps hide it in a macro, or invent an extra static inline routine that does the translation? +DO $$ +BEGIN + BEGIN + PERFORM test_bms_singleton_member(test_bms_from_array(ARRAY[1,2])); + RAISE NOTICE 'ERROR: singleton_member([1,2]) should have failed!'; Do we need all these DO blocks to catch failures? These cases bump on elog(ERROR) in the internals of bitmapset.c, which are not something the end-users would expect, but in a test module that's fine to trigger. The function call would just fail, which is fine. The queries with CTEs and UNION ALL are elegant, nice. > This exercise turned into a lot more LOC than I'd expected, I hope it > provides some value to the community for testing this important data > structure and helps to codify the API clearly. It does, at least that's my opinion. So thanks for caring about this level of testing. -- Michael
Вложения
On Sep 16 2025, at 8:35 pm, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Sep 16, 2025 at 03:00:40PM -0700, Masahiko Sawada wrote: >> Thank you for updating the patch. It seems cfbot caught a regression >> test error[1] in a 32-bit build. > > - bitmap_hash [1,3,5] | 49870778 > + bitmap_hash [1,3,5] | 1509752520 > > This one is able the hash value computation being not portable across > architectures. I would just change these to be non-NULL, I guess. > -- > Michael Thanks Michael, Sawada-san, yep hashing is arch dependent (go figure!), but thankfully it is deterministic. Rather that test for non-NULL I've come up with this approach: -- Architecture-aware hash testing WITH arch_info AS ( SELECT CASE WHEN pg_column_size(1::bigint) = 8 THEN '64bit' ELSE '32bit' END as architecture ), expected_values AS ( SELECT architecture, CASE architecture WHEN '64bit' THEN 0 WHEN '32bit' THEN 0 END as hash_null, CASE architecture WHEN '64bit' THEN 49870778 WHEN '32bit' THEN 1509752520 END as hash_135, CASE architecture WHEN '64bit' THEN -303921606 WHEN '32bit' THEN 0 -- TBD END as hash_246 FROM arch_info ) SELECT 'expected hash NULL' as test, test_bitmap_hash(NULL) = hash_null as result FROM expected_values UNION ALL SELECT 'expected hash [1,3,5]' as test, test_bitmap_hash(test_bms_from_array(ARRAY[1,3,5])) = hash_135 as result FROM expected_values UNION ALL SELECT 'expected hash [2,4,6]' as test, test_bitmap_hash(test_bms_from_array(ARRAY[2,4,6])) = hash_246 as result FROM expected_values; However I'm not sure what the value is for testing hash functions except for coverage and I'm fairly certain that function is well exercised. That said, I think that will work. I'll let cfbot tell me the 32bit hash value on the next run. thanks again for the helpful insights, -greg
On Sep 16 2025, at 11:25 pm, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Sep 16, 2025 at 03:04:39PM -0400, Greg Burd wrote: >> I've re-written the set of tests as suggested (I hope!). I've not >> re-run the coverage report (yet) to ensure we're at least as well >> covered as v3, but attached is v4 for your inspection (amusement?). >> I've tried to author the SQL tests such that failures clearly indicate >> what's at gone wrong. > > Thanks for the update, that's easier to follow, even if it's more > code overall. It looks like all the 29 APIs are present, plus the > extras for the array mapping routines required for the tests with the > list APIs. Thank you for the continued detailed feedback, I think we're zeroing in on a solid patch. > +static void > +elog_bitmapset(int elevel, const char *label, const Bitmapset *bms) > > This routine, that's able to translate a bytea into a readable form, > is also something that I would have added as an extra function, then > used it in some of the tests where I would care about the bitmap in > output generated after a given operation, removing the DEBUG parts > that I guess have also a risk of rotting over time. That would make > the tests for simple operations that generate a bms as a result in > easier to read, because we would call less SQL calls. For example, > take the "add_member idempotent" case, that calls twice > test_bms_make_singleton() to compare the output byteas, we could just > do a test_bms_add_member(test_bms_make_singleton(10), 10) and show the > output? I added a function bms_values() that prints out the values contained in a bitmapset and changed a few tests to use it. I removed the elog_bitmapset() as it isn't being used and now that it's easy to output the set in SQL why keep it for debugging? > We could perhaps use the CTE style for the basic calls, to reduce the > test_bms_make_singleton() calls. What you are doing feels OK as well, > just a thought in passing. I changed a few tests I saw with the pattern you mentioned. If you identify additional tests you feel should be re-written please let me know. > FWIW, I tend to write the regression test queries so as these are > limited to 72-80 characters per line, to fit even in a small > terminals. That's just an old-school take, even if it means more > lines in the SQL input file. I have a vt-100 sitting behind me, I get it. I tend to try to line-wrap at 78. I tried a few times to re-format the SQL accordingly and I found that it became harder to understand and less readable IMO. I'd prefer to keep it as is, unless there is a hard requirement in which case I'll reformat it. > Perhaps it would be worth documenting what's the value of > test_random_operations? It is a mean of testing add, del and > is_member with a modulo 1000 of the member number. With the other > functions that offer deterministic tests, I am not sure what's the > extra value gained in it. That was my thinking and why I didn't have a test that used random data to start with (v1). I'd rather just remove it, but I'm open to ideas. I could replace it with the earlier version of the function. > + bms1_data = PG_GETARG_BYTEA_PP(0); > + if (VARSIZE_ANY_EXHDR(bms1_data) > 0) > + { > + bms1 = (Bitmapset *) palloc(VARSIZE_ANY_EXHDR(bms1_data)); > + memcpy(bms1, VARDATA_ANY(bms1_data), VARSIZE_ANY_EXHDR(bms1_data)); > + } > > This pattern is repeated 9 times, if my fingers got it right. Perhaps > hide it in a macro, or invent an extra static inline routine that does > the translation? That was an oversight on my part, all of those have been reduced to use the same pattern with encode/decode bms() functions. > +DO $$ > +BEGIN > + BEGIN > + PERFORM test_bms_singleton_member(test_bms_from_array(ARRAY[1,2])); > + RAISE NOTICE 'ERROR: singleton_member([1,2]) should have failed!'; > > Do we need all these DO blocks to catch failures? These cases bump on > elog(ERROR) in the internals of bitmapset.c, which are not something > the end-users would expect, but in a test module that's fine to > trigger. The function call would just fail, which is fine. Sure, that's easy enough. > The queries with CTEs and UNION ALL are elegant, nice. Thanks. >> This exercise turned into a lot more LOC than I'd expected, I hope it >> provides some value to the community for testing this important data >> structure and helps to codify the API clearly. > > It does, at least that's my opinion. So thanks for caring about this > level of testing. Again, thank YOU(!) for caring enough to review the work. > -- > Michael Don't forget, we expect the cfbot/CI to fail on 32bit systems and tell us the hash value we need to record in the next version of the patch. Or, if we don't think testing the hash functions is worth it (and I'm on the fence on this one) then I can just remove it. best. -greg
Вложения
On Wed, Sep 17, 2025 at 9:18 AM Greg Burd <greg@burd.me> wrote: > > +static void > > +elog_bitmapset(int elevel, const char *label, const Bitmapset *bms) > > > > I added a function bms_values() that prints out the values contained in > a bitmapset and changed a few tests to use it. I removed the > elog_bitmapset() as it isn't being used and now that it's easy to output > the set in SQL why keep it for debugging? How about using outBitmapset() which already exists? -- Robert Haas EDB: http://www.enterprisedb.com
On Sep 17 2025, at 9:55 am, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 17, 2025 at 9:18 AM Greg Burd <greg@burd.me> wrote: >> > +static void >> > +elog_bitmapset(int elevel, const char *label, const Bitmapset *bms) >> > >> >> I added a function bms_values() that prints out the values contained in >> a bitmapset and changed a few tests to use it. I removed the >> elog_bitmapset() as it isn't being used and now that it's easy to output >> the set in SQL why keep it for debugging? > > How about using outBitmapset() which already exists? Robert, Thanks for chiming in, not sure how I missed that one. :) > -- > Robert Haas > EDB: http://www.enterprisedb.com I've updated the hash value for 32bit systems and changed the bms_values() function to use outBitmapset() via bmsToString(). best. -greg
Вложения
On Wed, Sep 17, 2025 at 10:53:26AM -0400, Greg Burd wrote: > I've updated the hash value for 32bit systems and changed the > bms_values() function to use outBitmapset() via bmsToString(). Oh, right, I also forgot that we have node functions for that. And actually, could it be better to use nodeRead() to translate the inputs? I've just remembered that we have some custom code in there with other lists, which is documented for the purpose of extensions with extensible nodes. -- Michael
Вложения
Hello hackers, :) Thanks for all the feedback and time spent reviewing this patch. I switched out the encode/decode functions to use the nodeToString() and stringToNode() functions and change all the SQL testing function signatures to TEXT from BYTEA. This exercises more code and that's good for testing purposes. I took out the code that checks the hash values, I can't think of a reason that should cause a future failure should that change and output different values. I re-integrated the earlier random testing function along with the newer code. I'm not sure this buys us much if anything. coverage: HEAD v7 lines......: 87.1 90.5 +3.4% functions..: 100.0 100.0 +0.0% branches...: 63.5 75.9 +12.4% best. -greg
Вложения
On Thu, Sep 18, 2025 at 02:50:45PM -0400, Greg Burd wrote: > Thanks for all the feedback and time spent reviewing this patch. I > switched out the encode/decode functions to use the nodeToString() and > stringToNode() functions and change all the SQL testing function > signatures to TEXT from BYTEA. This exercises more code and that's good > for testing purposes. I took out the code that checks the hash values, > I can't think of a reason that should cause a future failure should that > change and output different values. I re-integrated the earlier random > testing function along with the newer code. I'm not sure this buys us > much if anything. > > coverage: HEAD v7 > lines......: 87.1 90.5 +3.4% > functions..: 100.0 100.0 +0.0% > branches...: 63.5 75.9 +12.4% Nathan has given me his blessing, so as I could look at this patch and put my hands on it. I have spotted one bug, and things that could be done a bit better. Please see below. As far as I am concerned there is a mistake with bms_overlap_list(). This API checks a Bitmapset with a list of integers, but you have coded things so as it checks for a list of Bitmapset. I think that test_bms_overlap_list() should take in input a int4 array, then the code in charge of the conversion from the array to the list needs to be tweaked a little bit. With the current code, we could get random failures with negative members included in a Bitmapset, depending on what's on the stack. I am not really convinced about the value brought by bms_values() now that the output is generated for all the SQL functions using the "decode" and "encode" routines (which may be actually less confusing if renamed as "bms_to_text" and "text_to_bms", but that's a personal take), because this results in doing an extra round-trip between text and Bitmapset. The tests don't rely much on NULL as input to translate that as "(b)". bmsToString() is just a direct call to what the decode and encode routines do. Same remark about test_bms_from_array(), that mimicks nodeRead(), in charge of doing a int4[] -> Bitmapset conversion. Wouldn't it be simpler to just use "(b 1 2 3)" in input and let the decode/encode routines do the job? Let's replace the test descriptions in the queries that we know have an individual result by comments. For example all the "Range operations". These descriptions are useful in the UNION queries, where we want to rely on a set of Bitmapsets for multiple operations, to be able to know the description of the result. For individual queries, this bloats the output. The numbering in the tests are not that useful to have. If the test suite is updated, that would mean more updates required if we add a new block in the middle. +SELECT 'overlapping ranges' as test, + bms_values(test_bms_union( + test_bms_add_range(NULL, 50, 150), + test_bms_add_range(NULL, 100, 200) The output generated by this query leads to a bitmapset with 200 members. It would be simpler and more readable to reduce these ranges, or is there a reason for these to be large? Same remark for "difference subtraction edge case" with its 50 members. +SELECT 'add_range large range' as test, bms_values(test_bms_add_range(NULL, 0, 1000)) as result; This one also is funky with its output of 1000 elements. Any reason this is justified in terms of coverage? If yes, we could use some substr() calls to get the end and the beginning of the output generated, perhaps to reduce the output of the whole, or the length, or something like that relevant enough to validate the output. There is a second test a bit down that uses a range of [1000,1100], as well. -- Michael
Вложения
On Sep 19 2025, at 2:36 am, Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Sep 18, 2025 at 02:50:45PM -0400, Greg Burd wrote: >> Thanks for all the feedback and time spent reviewing this patch. I >> switched out the encode/decode functions to use the nodeToString() and >> stringToNode() functions and change all the SQL testing function >> signatures to TEXT from BYTEA. This exercises more code and that's good >> for testing purposes. I took out the code that checks the hash values, >> I can't think of a reason that should cause a future failure should that >> change and output different values. I re-integrated the earlier random >> testing function along with the newer code. I'm not sure this buys us >> much if anything. >> >> coverage: HEAD v7 >> lines......: 87.1 90.5 +3.4% >> functions..: 100.0 100.0 +0.0% >> branches...: 63.5 75.9 +12.4% > > Nathan has given me his blessing, so as I could look at this patch and > put my hands on it. I have spotted one bug, and things that could be > done a bit better. Please see below. Hey Michael, Thank you for taking the initiative and reviewing the code. > As far as I am concerned there is a mistake with bms_overlap_list(). > This API checks a Bitmapset with a list of integers, but you have > coded things so as it checks for a list of Bitmapset. I think that > test_bms_overlap_list() should take in input a int4 array, then the > code in charge of the conversion from the array to the list needs to > be tweaked a little bit. With the current code, we could get random > failures with negative members included in a Bitmapset, depending on > what's on the stack. Apologies, you're right. I'm not sure how I went astray on that, but it's fixed now. > I am not really convinced about the value brought by bms_values() now > that the output is generated for all the SQL functions using the > "decode" and "encode" routines (which may be actually less confusing > if renamed as "bms_to_text" and "text_to_bms", but that's a personal > take), because this results in doing an extra round-trip between text > and Bitmapset. Agreed, renamed and removed. New macros are: /* Encode/Decode to/from TEXT and Bitmapset */ #define BITMAPSET_TO_TEXT(bms) (text *) CStringGetTextDatum(nodeToString((bms))) #define TEXT_TO_BITMAPSET(str) (Bitmapset *) stringToNode(TextDatumGetCString((Datum) (str))) > The tests don't rely much on NULL as input to > translate that as "(b)". bmsToString() is just a direct call to what > the decode and encode routines do. Yes, agreed. > Same remark about test_bms_from_array(), that mimicks nodeRead(), in > charge of doing a int4[] -> Bitmapset conversion. Wouldn't it be > simpler to just use "(b 1 2 3)" in input and let the decode/encode > routines do the job? That makes a lot of sense now that things are all string to/from the Bitmapset node representation. Done. > Let's replace the test descriptions in the queries that we know have > an individual result by comments. For example all the "Range > operations". These descriptions are useful in the UNION queries, > where we want to rely on a set of Bitmapsets for multiple operations, > to be able to know the description of the result. For individual > queries, this bloats the output. Okay, done. > The numbering in the tests are not that useful to have. If the test > suite is updated, that would mean more updates required if we add a > new block in the middle. Agreed, removed. > +SELECT 'overlapping ranges' as test, > + bms_values(test_bms_union( > + test_bms_add_range(NULL, 50, 150), > + test_bms_add_range(NULL, 100, 200) > > The output generated by this query leads to a bitmapset with 200 > members. It would be simpler and more readable to reduce these > ranges, or is there a reason for these to be large? Same remark for > "difference subtraction edge case" with its 50 members. Some of these were targeted at corner cases to increase coverage. I'll review and consider ways to either a) eliminate them, or b) shorten the output. > +SELECT 'add_range large range' as test, > bms_values(test_bms_add_range(NULL, 0, 1000)) as result; > > This one also is funky with its output of 1000 elements. Any reason > this is justified in terms of coverage? If yes, we could use some > substr() calls to get the end and the beginning of the output > generated, perhaps to reduce the output of the whole, or the length, > or something like that relevant enough to validate the output. There > is a second test a bit down that uses a range of [1000,1100], as well. I used length() and recorded the expected output in the test. That seems to a) reduce noise in the output and b) remain deterministic(-ish). > -- > Michael I think I've incorporated your feedback which did indeed make the patch more readable/crisp and maintainable, thank you. coverage: HEAD v7 v8 lines......: 87.1 90.5 92.2 functions..: 100.0 100.0 100.0 branches...: 63.5 75.9 76.2 best. -greg
Вложения
On Fri, Sep 19, 2025 at 02:48:43PM -0400, Greg Burd wrote: > I think I've incorporated your feedback which did indeed make the patch > more readable/crisp and maintainable, thank you. > > coverage: HEAD v7 v8 > lines......: 87.1 90.5 92.2 bitmap_match() had some duplicated tests. bms_replace_members() was not covered, and a function was added to the module. I get up to 93.4% for the lines, once added. +PG_FUNCTION_INFO_V1(test_bms_from_array); +PG_FUNCTION_INFO_V1(test_bms_to_array); These two were still declared, are not required. There were a couple of functions that acted as dead code in the module, covered in the tree. As we are trying to cover the gap I think that we should just do the whole thing. Here are the functions: - is_empty - subset_compare - get_singleton_member - nonempty_difference - member_index - join. The test C function had a mistake, because either input may be modified or updated by bms_join(). The code was freeing the right input, leading to errors in the node output function when tested. Mixing the CTEs and the UNION ALL would have been nice if the queries in the UNIONs relied on more than one input value, but none of them did that, so I have removed this part and made the tests leaner. That feels easier for the eye. Testing on equality for the hash functions should be OK, so I have kept these, including the NULL/0 cases. A few incorrect things related to the style, fixed on the way, like an incorrect copyright notice in meson.build, the top of test_bitmapset.c was largely incorrect. It seems like these were copy-pasted from elsewhere. Another thing was testing with -DREALLOCATE_BITMAPSETS, which I've done. A couple of code paths related to the test C functions don't test for NULL input, which is still OK as these don't impact the internal tests or the backend coverage, so I've left these out. The result I had was good enough, so applied. The CI was OK, the buildfarm may have a different opinion. -- Michael
Вложения
On Sep 22 2025, at 3:57 am, Michael Paquier <michael@paquier.xyz> wrote: > On Fri, Sep 19, 2025 at 02:48:43PM -0400, Greg Burd wrote: >> I think I've incorporated your feedback which did indeed make the patch >> more readable/crisp and maintainable, thank you. >> >> coverage: HEAD v7 v8 >> lines......: 87.1 90.5 92.2 Michael, Thank you for your time and help. > bitmap_match() had some duplicated tests. > > bms_replace_members() was not covered, and a function was added to the > module. I get up to 93.4% for the lines, once added. > > +PG_FUNCTION_INFO_V1(test_bms_from_array); > +PG_FUNCTION_INFO_V1(test_bms_to_array); > These two were still declared, are not required. > > There were a couple of functions that acted as dead code in the > module, covered in the tree. As we are trying to cover the gap I > think that we should just do the whole thing. Here are the functions: > - is_empty > - subset_compare > - get_singleton_member > - nonempty_difference > - member_index > - join. The test C function had a mistake, because either input may > be modified or updated by bms_join(). The code was freeing the right > input, leading to errors in the node output function when tested. All great ideas, apologies for the oversights. > Mixing the CTEs and the UNION ALL would have been nice if the queries > in the UNIONs relied on more than one input value, but none of them > did that, so I have removed this part and made the tests leaner. That > feels easier for the eye. Agreed, thank you. > Testing on equality for the hash functions should be OK, so I have > kept these, including the NULL/0 cases. Sounds good to me. > A few incorrect things related to the style, fixed on the way, like an > incorrect copyright notice in meson.build, the top of test_bitmapset.c > was largely incorrect. It seems like these were copy-pasted from > elsewhere. Yes, copied from the radixtree test module. > Another thing was testing with -DREALLOCATE_BITMAPSETS, which I've > done. That seems like a good idea, I thought about doing that as well. > A couple of code paths related to the test C functions don't > test for NULL input, which is still OK as these don't impact the > internal tests or the backend coverage, so I've left these out. Fair. > The result I had was good enough, so applied. The CI was OK, the > buildfarm may have a different opinion. > -- > Michael Thanks for the help getting this done, I really appreciate it. So far the buildfarm seems okay. :) best. -greg
Michael Paquier <michael@paquier.xyz> writes: > The result I had was good enough, so applied. The CI was OK, the > buildfarm may have a different opinion. This patch seems to be rather full of arbitrary casts to or from Datum, which is no longer okay. You need to be using the appropriate conversion macros, such as PointerGetDatum. https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-09-23%2002%3A12%3A36 regards, tom lane
On Tue, Sep 23, 2025 at 01:43:47AM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: >> The result I had was good enough, so applied. The CI was OK, the >> buildfarm may have a different opinion. > > This patch seems to be rather full of arbitrary casts to or > from Datum, which is no longer okay. You need to be using > the appropriate conversion macros, such as PointerGetDatum. > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2025-09-23%2002%3A12%3A36 Right, this can ve reproduced with a -m32 added to gcc. I don't see a need for a Datum manipulation in these conversion macros, as we already allocate the results to and from "text" before/after using the GETARG or RETURN macros. Using directly text_to_cstring() and cstring_to_text() takes care of the warnings, as well. -- Michael
Вложения
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Sep 23, 2025 at 01:43:47AM -0400, Tom Lane wrote: >> This patch seems to be rather full of arbitrary casts to or >> from Datum, which is no longer okay. You need to be using >> the appropriate conversion macros, such as PointerGetDatum. > Right, this can ve reproduced with a -m32 added to gcc. Curiously, I don't see these warnings on 32-bit FreeBSD (with clang version 19.1.7). But I tested your patch on mamba's host and confirmed that it makes that gcc happy. > I don't see a need for a Datum manipulation in these conversion > macros, as we already allocate the results to and from "text" > before/after using the GETARG or RETURN macros. Using directly > text_to_cstring() and cstring_to_text() takes care of the warnings, as > well. Yeah, this is better, but I think the new macros could use a bit more parenthesis-twiddling: +#define BITMAPSET_TO_TEXT(bms) cstring_to_text(nodeToString((bms))) The extra parens around "bms" are surely unnecessary. (If they were necessary, it could only be because nodeToString was a macro that failed to parenthesize its argument where needed. But that would be a bug to fix in nodeToString, not here.) +#define TEXT_TO_BITMAPSET(str) (Bitmapset *) stringToNode(text_to_cstring(str)) Here, on the other hand, I think an extra outer set of parens would be a good idea, ie +#define TEXT_TO_BITMAPSET(str) ((Bitmapset *) stringToNode(text_to_cstring(str))) It may be that the C cast binds tightly enough that this cannot be a problem, but I don't think assuming that is project style. regards, tom lane
On Sep 23 2025, at 12:46 pm, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael@paquier.xyz> writes: >> On Tue, Sep 23, 2025 at 01:43:47AM -0400, Tom Lane wrote: >>> This patch seems to be rather full of arbitrary casts to or >>> from Datum, which is no longer okay. You need to be using >>> the appropriate conversion macros, such as PointerGetDatum. Tom, thanks for pointing out the oversight. Apologies, I should have tried that out in my local tests. >> Right, this can ve reproduced with a -m32 added to gcc. > > Curiously, I don't see these warnings on 32-bit FreeBSD > (with clang version 19.1.7). But I tested your patch on > mamba's host and confirmed that it makes that gcc happy. > >> I don't see a need for a Datum manipulation in these conversion >> macros, as we already allocate the results to and from "text" >> before/after using the GETARG or RETURN macros. Using directly >> text_to_cstring() and cstring_to_text() takes care of the warnings, as >> well. > > Yeah, this is better, but I think the new macros could use a bit > more parenthesis-twiddling: > > +#define BITMAPSET_TO_TEXT(bms) cstring_to_text(nodeToString((bms))) Agreed, that was my mistake. > The extra parens around "bms" are surely unnecessary. (If they were > necessary, it could only be because nodeToString was a macro that > failed to parenthesize its argument where needed. But that would > be a bug to fix in nodeToString, not here.) > > +#define TEXT_TO_BITMAPSET(str) (Bitmapset *) stringToNode(text_to_cstring(str)) > > Here, on the other hand, I think an extra outer set of parens > would be a good idea, ie > > +#define TEXT_TO_BITMAPSET(str) ((Bitmapset *) stringToNode(text_to_cstring(str))) Sounds reasonable to me. > It may be that the C cast binds tightly enough that this cannot > be a problem, but I don't think assuming that is project style. > > regards, tom lane patch attached, best. -greg
Вложения
On Tue, Sep 23, 2025 at 02:26:19PM -0400, Greg Burd wrote: > patch attached, best. All that seems to work corrently here, so done. -- Michael
Вложения
On Sep 23 2025, at 7:23 pm, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Sep 23, 2025 at 02:26:19PM -0400, Greg Burd wrote: >> patch attached, best. > > All that seems to work corrently here, so done. > -- > Michael Thanks Michael, Tom, for the help getting this into shape and in the tree. best. -greg
On Wed, Sep 24, 2025 at 07:39:59AM -0400, Greg Burd wrote: > Thanks Michael, Tom, for the help getting this into shape and in the tree. By the way, Greg, do you think that we should aim for a state where we are closer to completion? We have the module up to the point where we are in pretty good shape, with most things and the infrastructure done but it can be improved a bit more, as well. Based on the information provided by the coverage report at https://coverage.postgresql.org/src/backend/nodes/bitmapset.c.gcov.html, we still have the following things: - bms_equal for different word counts - bms_union, bms_nonempty_difference, bms_is_subset and bms_intersect with shorter word counts. - bms_different with different word counts - A couple more cases with bms_subset_compare - bms_member_index and word counts - bms_overlap_list with negative number in input list. - bms_singleton_member ERROR with empty input. - bms_get_singleton_member with NULL input - bms_del_member with word counts - bms_replace_members and repalloc case - bms_add_range, bms_join and bms_del_members, more word count cases - bms_prev_member and the prevbit business There is not much we can do with the random function, still we could do something about the NULL paths in the internal functions: https://coverage.postgresql.org/src/test/modules/test_bitmapset/test_bitmapset.c.gcov.html The coverage of the latter matters less than the coverage of the former, of course. -- Michael
Вложения
On Sep 24 2025, at 7:28 pm, Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Sep 24, 2025 at 07:39:59AM -0400, Greg Burd wrote: >> Thanks Michael, Tom, for the help getting this into shape and in the tree. > > By the way, Greg, do you think that we should aim for a state where we > are closer to completion? Hi Michael, We've come this far, why not go all the way? :) > We have the module up to the point where we > are in pretty good shape, with most things and the infrastructure done > but it can be improved a bit more, as well. Agreed. > Based on the information provided by the coverage report at > https://coverage.postgresql.org/src/backend/nodes/bitmapset.c.gcov.html, > we still have the following things: > - bms_equal for different word counts > - bms_union, bms_nonempty_difference, bms_is_subset and bms_intersect > with shorter word counts. > - bms_different with different word counts > - A couple more cases with bms_subset_compare > - bms_member_index and word counts > - bms_overlap_list with negative number in input list. > - bms_singleton_member ERROR with empty input. > - bms_get_singleton_member with NULL input > - bms_del_member with word counts > - bms_replace_members and repalloc case > - bms_add_range, bms_join and bms_del_members, more word count cases > - bms_prev_member and the prevbit business Thanks for the detailed analysis, I tried to address each of these in the patch. > There is not much we can do with the random function, still we could > do something about the NULL paths in the internal functions: > https://coverage.postgresql.org/src/test/modules/test_bitmapset/test_bitmapset.c.gcov.html Agreed. > The coverage of the latter matters less than the coverage of the > former, of course. Agreed. > -- > Michael test_bitmapset.c lines......: 98.9% functions..: 100% branches...: 82.6% Here I removed a bit of code I thought was unnecessary and added a few test cases. The coverage is better overall, not much to say about it. bitmapset.c lines......: 99.8% functions..: 100% branches...: 89.4% A few branches were really tricky, my (least) favorite was the bms_subset_compare(). To get to the first BMS_DIFFERENT return you have to have a test case where the shorter bitmap is long enough to force two or more iterations of the loop and in the first iteration the first word in a needs to be a subset of b and then for the second word the opposite where the second word of b is a subset of a and yet there can be no overlapping bits in the two sets and the same number of words in both sets. test_bms_subset_compare('(b 1 64 65)', '(b 1 2 64)') -> BMS_DIFFERENT Technically the '(b 64 200)', '(b 1 201)' test case is also BMS_DIFFERENT and so redundant, but I was so happy to find the test case for this branch I kept them both. best. -greg
Вложения
On Sat, Sep 27, 2025 at 10:03:14AM -0400, Greg Burd wrote: > > On Sep 24 2025, at 7:28 pm, Michael Paquier <michael@paquier.xyz> wrote: > > > On Wed, Sep 24, 2025 at 07:39:59AM -0400, Greg Burd wrote: > >> Thanks Michael, Tom, for the help getting this into shape and in the tree. > > > > By the way, Greg, do you think that we should aim for a state where we > > are closer to completion? > > Hi Michael, > > We've come this far, why not go all the way? :) > > > We have the module up to the point where we > > are in pretty good shape, with most things and the infrastructure done > > but it can be improved a bit more, as well. > > Agreed. > > > Based on the information provided by the coverage report at > > https://coverage.postgresql.org/src/backend/nodes/bitmapset.c.gcov.html, > > we still have the following things: > > - bms_equal for different word counts > > - bms_union, bms_nonempty_difference, bms_is_subset and bms_intersect > > with shorter word counts. > > - bms_different with different word counts > > - A couple more cases with bms_subset_compare > > - bms_member_index and word counts > > - bms_overlap_list with negative number in input list. > > - bms_singleton_member ERROR with empty input. > > - bms_get_singleton_member with NULL input > > - bms_del_member with word counts > > - bms_replace_members and repalloc case > > - bms_add_range, bms_join and bms_del_members, more word count cases > > - bms_prev_member and the prevbit business > > Thanks for the detailed analysis, I tried to address each of these in > the patch. > > > There is not much we can do with the random function, still we could > > do something about the NULL paths in the internal functions: > > https://coverage.postgresql.org/src/test/modules/test_bitmapset/test_bitmapset.c.gcov.html > > Agreed. > > > The coverage of the latter matters less than the coverage of the > > former, of course. > > Agreed. > test_bitmapset.c > lines......: 98.9% > functions..: 100% > branches...: 82.6% > > Here I removed a bit of code I thought was unnecessary and added a few > test cases. The coverage is better overall, not much to say about it. > > > bitmapset.c > lines......: 99.8% > functions..: 100% > branches...: 89.4% No need for three tests for the word count case in bms_del_member(), only one is enough, but I've let them as they you have proposed, as you wanted to check more positioning with the members. These are cheap, that's fine. Added a comment for bms_del_members() for the two cases that trigger word count changes. The test changed with test_bms_difference() is indeed long, but not that bad with 140-ish characters, so left it as-is. Good catch with bms_del_members() that was missing. There are so many APIs that it's easy to miss one. The changes in test_bms_get_singleton_member() don't seem adapted, because we also want to test if the default value given by the function caller is changed, bms_get_singleton_member() returning a boolean status. The patch did not use arg1 at all. Also, if arg0 is NULL, let's return the input value. > A few branches were really tricky, my (least) favorite was the > bms_subset_compare(). To get to the first BMS_DIFFERENT return you have > to have a test case where the shorter bitmap is long enough to force two > or more iterations of the loop and in the first iteration the first word > in a needs to be a subset of b and then for the second word the opposite > where the second word of b is a subset of a and yet there can be no > overlapping bits in the two sets and the same number of words in both sets. > > test_bms_subset_compare('(b 1 64 65)', '(b 1 2 64)') -> BMS_DIFFERENT > > Technically the '(b 64 200)', '(b 1 201)' test case is also > BMS_DIFFERENT and so redundant, but I was so happy to find the test case > for this branch I kept them both. I can see that you have expanded on these quite a bit, impressive. One thing that was confusing is that some of the tests already existed, so I've slightly reworded things to reduce the diffs. Per my measurements, we are at more than 99% in the module and bitmapset.c when only running make check in the module, which is nice -- Michael
Вложения
On Sep 29 2025, at 2:20 am, Michael Paquier <michael@paquier.xyz> wrote: > On Sat, Sep 27, 2025 at 10:03:14AM -0400, Greg Burd wrote: >> >> On Sep 24 2025, at 7:28 pm, Michael Paquier <michael@paquier.xyz> wrote: >> >> > On Wed, Sep 24, 2025 at 07:39:59AM -0400, Greg Burd wrote: >> >> Thanks Michael, Tom, for the help getting this into shape and in >> the tree. >> > >> > By the way, Greg, do you think that we should aim for a state where we >> > are closer to completion? >> >> Hi Michael, >> >> We've come this far, why not go all the way? :) >> >> > We have the module up to the point where we >> > are in pretty good shape, with most things and the infrastructure done >> > but it can be improved a bit more, as well. >> >> Agreed. >> >> > Based on the information provided by the coverage report at >> > https://coverage.postgresql.org/src/backend/nodes/bitmapset.c.gcov.html, >> > we still have the following things: >> > - bms_equal for different word counts >> > - bms_union, bms_nonempty_difference, bms_is_subset and bms_intersect >> > with shorter word counts. >> > - bms_different with different word counts >> > - A couple more cases with bms_subset_compare >> > - bms_member_index and word counts >> > - bms_overlap_list with negative number in input list. >> > - bms_singleton_member ERROR with empty input. >> > - bms_get_singleton_member with NULL input >> > - bms_del_member with word counts >> > - bms_replace_members and repalloc case >> > - bms_add_range, bms_join and bms_del_members, more word count cases >> > - bms_prev_member and the prevbit business >> >> Thanks for the detailed analysis, I tried to address each of these in >> the patch. >> >> > There is not much we can do with the random function, still we could >> > do something about the NULL paths in the internal functions: >> > https://coverage.postgresql.org/src/test/modules/test_bitmapset/test_bitmapset.c.gcov.html >> >> Agreed. >> >> > The coverage of the latter matters less than the coverage of the >> > former, of course. >> >> Agreed. >> test_bitmapset.c >> lines......: 98.9% >> functions..: 100% >> branches...: 82.6% >> >> Here I removed a bit of code I thought was unnecessary and added a few >> test cases. The coverage is better overall, not much to say about it. >> >> >> bitmapset.c >> lines......: 99.8% >> functions..: 100% >> branches...: 89.4% Hello Michael, > No need for three tests for the word count case in bms_del_member(), > only one is enough, but I've let them as they you have proposed, as > you wanted to check more positioning with the members. These are > cheap, that's fine. Yes, redundant. Thanks for leaving them. > Added a comment for bms_del_members() for the two cases that trigger > word count changes. Thanks, I considered adding comments to cases that were more subtle but didn't find the time to review/add them. I appreciate that you added that. > The test changed with test_bms_difference() is indeed long, but not > that bad with 140-ish characters, so left it as-is. Thanks. > Good catch with bms_del_members() that was missing. There are so many > APIs that it's easy to miss one. Indeed, but it's covered now. > The changes in test_bms_get_singleton_member() don't seem adapted, > because we also want to test if the default value given by the > function caller is changed, bms_get_singleton_member() returning a > boolean status. The patch did not use arg1 at all. Also, if arg0 is > NULL, let's return the input value. Doh, I see what you mean. Thanks for the adaptation. >> A few branches were really tricky, my (least) favorite was the >> bms_subset_compare(). To get to the first BMS_DIFFERENT return you have >> to have a test case where the shorter bitmap is long enough to force two >> or more iterations of the loop and in the first iteration the first word >> in a needs to be a subset of b and then for the second word the opposite >> where the second word of b is a subset of a and yet there can be no >> overlapping bits in the two sets and the same number of words in both sets. >> >> test_bms_subset_compare('(b 1 64 65)', '(b 1 2 64)') -> BMS_DIFFERENT >> >> Technically the '(b 64 200)', '(b 1 201)' test case is also >> BMS_DIFFERENT and so redundant, but I was so happy to find the test case >> for this branch I kept them both. > > I can see that you have expanded on these quite a bit, impressive. > One thing that was confusing is that some of the tests already > existed, so I've slightly reworded things to reduce the diffs. Excellent, thanks. > Per my > measurements, we are at more than 99% in the module and bitmapset.c > when only running make check in the module, which is nice Yes, I think that does it. Thanks so much for the guidance and help taking this to its logical end. I greatly appreciate your guidance and time. > -- > Michael best. -greg
On Mon, Sep 29, 2025 at 06:27:01AM -0400, Greg Burd wrote: > Yes, I think that does it. Thanks so much for the guidance and help > taking this to its logical end. I greatly appreciate your guidance and time. The latest coverage report confirms 100% for bitmapset.c The only two code paths I'm seeing as not covered are: - test_bms_add_range() for a bms_free() case. - test_bms_make_singleton() on NULL input, because the function is created as strict: =# select proisstrict from pg_proc where proname = 'test_bms_make_singleton'; proisstrict ------------- t (1 row) I'm not planning to bother about these, so we are good here. -- Michael
Вложения
On Tue, 30 Sept 2025 at 12:40, Michael Paquier <michael@paquier.xyz> wrote: > The only two code paths I'm seeing as not covered are: > - test_bms_add_range() for a bms_free() case. One question about that: In cases like [1], what's the reason that many of the bms_free() calls check if the set is NULL before calling the function? NULL is a valid Bitmapset, so I don't really see the need to check for an empty set before calling bms_free(). If those were removed, then you'd not have to care about the coverage of that line. David [1] https://coverage.postgresql.org/src/test/modules/test_bitmapset/test_bitmapset.c.gcov.html#699
On Tue, Sep 30, 2025 at 01:19:05PM +1300, David Rowley wrote: > NULL is a valid Bitmapset, so I don't really see the need to check for > an empty set before calling bms_free(). If those were removed, then > you'd not have to care about the coverage of that line. Yeah, we could just remove them as you are suggesting, as well. -- Michael
Вложения
On Sep 29 2025, at 8:51 pm, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Sep 30, 2025 at 01:19:05PM +1300, David Rowley wrote: >> NULL is a valid Bitmapset, so I don't really see the need to check for >> an empty set before calling bms_free(). If those were removed, then >> you'd not have to care about the coverage of that line. Good point, I agree. Thanks for reporting/reviewing David. > Yeah, we could just remove them as you are suggesting, as well. > -- > Michael Thank you both, patch attached. best. -greg
Вложения
On Tue, 30 Sept 2025 at 22:30, Greg Burd <greg@burd.me> wrote: > Thank you both, patch attached. Patch looks good to me. Just while reviewing, I was confused at the following code in test_bms_add_range(). /* Check for invalid range */ if (upper < lower) { bms_free(bms); PG_RETURN_NULL(); } That seems wrong. You should just return the input set in that case, not NULL. This results in: postgres=# select test_bms_add_range('(b 1)', 2, 5); -- ok. test_bms_add_range -------------------- (b 1 2 3 4 5) (1 row) postgres=# select test_bms_add_range('(b 1)', 5, 2); -- wrong results test_bms_add_range -------------------- (1 row) I'd expect the last one to return '(b 1)', effectively the input set unmodified. If I remove the code quoted above, I also see something else unexpected: postgres=# select test_bms_add_range('(b)', 5, 2); test_bms_add_range -------------------- <> (1 row) Now, you might blame me for that as I see you've done things like: if (bms_is_empty(bms)) PG_RETURN_NULL(); in other places, but it's not very consistent as I can get the <> in other locations: postgres=# select test_bms_add_members('(b)', '(b)'); test_bms_add_members ---------------------- <> (1 row) It seems to me that you could save some code and all this inconsistency by just making <> the standard way to represent an empty Bitmapset. Or if you really want to keep the SQL NULLs, you could make a helper macro that handles that consistently. For me, I think stripping as much logic out of the test functions as possible is a good way of doing things. I expect you really want to witness the behaviour of bitmapset.c, not some anomaly of test_bitmapset.c. David
On Sep 30 2025, at 6:29 am, David Rowley <dgrowleyml@gmail.com> wrote: > On Tue, 30 Sept 2025 at 22:30, Greg Burd <greg@burd.me> wrote: >> Thank you both, patch attached. > > Patch looks good to me. > > Just while reviewing, I was confused at the following code in > test_bms_add_range(). > > /* Check for invalid range */ > if (upper < lower) > { > bms_free(bms); > PG_RETURN_NULL(); > } > > That seems wrong. You should just return the input set in that case, > not NULL. Agreed, I should have caught this inconsistency earlier on. > This results in: > > postgres=# select test_bms_add_range('(b 1)', 2, 5); -- ok. > test_bms_add_range > -------------------- > (b 1 2 3 4 5) > (1 row) > > postgres=# select test_bms_add_range('(b 1)', 5, 2); -- wrong results > test_bms_add_range > -------------------- > > (1 row) > > I'd expect the last one to return '(b 1)', effectively the input set unmodified. > > If I remove the code quoted above, I also see something else unexpected: > > postgres=# select test_bms_add_range('(b)', 5, 2); > test_bms_add_range > -------------------- > <> > (1 row) > > Now, you might blame me for that as I see you've done things like: > > if (bms_is_empty(bms)) > PG_RETURN_NULL(); > > in other places, but it's not very consistent as I can get the <> in > other locations: > > postgres=# select test_bms_add_members('(b)', '(b)'); > test_bms_add_members > ---------------------- > <> > (1 row) > > It seems to me that you could save some code and all this > inconsistency by just making <> the standard way to represent an empty > Bitmapset. Or if you really want to keep the SQL NULLs, you could make > a helper macro that handles that consistently. nodeToString() will turn NULL into '<>' because it has no way to know that we're referring to a NULL/empty Bitmapset, but that's not what we'd ideally like to return in those cases because NULL Bitmapsets are in fact a thing. The standard/consistent representation for a NULL Bitmapset encoded by nodeToString() is '(b)' so I'll update to be more consistent across tests. > For me, I think stripping as much logic out of the test functions as > possible is a good way of doing things. I expect you really want to > witness the behaviour of bitmapset.c, not some anomaly of > test_bitmapset.c. You are 100% correct, I've stripped away as much as possible so as to push the responsibility for results down to the bitmapset.c code as it should be. > David v2 attached. best. -greg
Вложения
On Wed, 1 Oct 2025 at 01:38, Greg Burd <greg@burd.me> wrote: > v2 attached. Thanks. I looked at this and I think we can do a bit better to simplify the code and standardise what we do when faced with bogus inputs. Here's what I've done: 1. Added helper macros PG_ARG_GETBITMAPSET() and PG_RETURN_BITMAPSET_AS_TEXT(). These themselves save a few hundred lines of code and make the functions easier to read. 2. Standardised that the test function returns NULL when it receives invalid input. e.g if we aim to pass an "int" to the underlying Bitmapset function, if someone calls the test function with an SQL NULL, just return NULL rather than trying to come up with some logic on what we should do about that. 3. I couldn't quite understand the extra parameter to test_bms_get_singleton_member() so I ended up removing it and making the test function return -1 when the set isn't a singleton. I didn't search for discussion about that so I might have missed the specific reason it was done that way. 4. Wrote some comments to try to communicate the standards defined. This aims to offer guidance to callers of these functions and anyone hacking on the module itself in the future. git diff --stat looks like: 4 files changed, 233 insertions(+), 505 deletions(-) Things not done: 1. I didn't update the expected test results. 2. I didn't rationalise the tests. I feel that there's no point in the tests that purposefully pass invalid input to the module's test function. It appears that's aiming to increase coverage on the test module's C code, which I don't quite understand. In any case, due to the helper macros, I don't think removing these will lower the per-source-file line coverage of the module itself. I've not checked my work very thoroughly. I can do that, or you can if you and Michael are ok with the proposed ideas. I'm also not aiming to commandeer anything here. Writing the patch was just the easiest way to communicate the ideas I had. I also wonder how worthwhile it is to try to free the Bitmapsets. I'd have expected these allocations to be in a pretty short-lived memory context. I count 44 calls to bms_free(). David
Вложения
On Wed, Oct 01, 2025 at 12:00:39PM +1300, David Rowley wrote: > 1. Added helper macros PG_ARG_GETBITMAPSET() and > PG_RETURN_BITMAPSET_AS_TEXT(). These themselves save a few hundred > lines of code and make the functions easier to read. +#define PG_RETURN_BITMAPSET_AS_TEXT(bms) \ + do { \ + text *result = BITMAPSET_TO_TEXT(bms); \ + bms_free(bms); \ + PG_RETURN_TEXT_P(result); \ + } while (0); This is also something I've considered as embedding in a macro, and discarded it. The reason was code readability, because it gives to the reader a quick knowledge of what can be freed once a call to a specific bitmapset.c routine is done, acting as a kind of reference template. Here, the free call knowledge gets hidden. test_bms_copy() becomes more confusing to me. This argument comes down to one's taste and philosophy on the matter of test modules. These are for me tools of coverage, but also can work as reference templates when calling an internal routine because we also document some of its implied behaviors, like the inputs recycled. > 2. Standardised that the test function returns NULL when it receives > invalid input. e.g if we aim to pass an "int" to the underlying > Bitmapset function, if someone calls the test function with an SQL > NULL, just return NULL rather than trying to come up with some logic > on what we should do about that. +static const char *emptyset = "(b)"; Not sure that I'm on board with hardcoding this knowledge in the test module. But well, if you feel strongly about it, I'm not going to put a fight for it. > 3. I couldn't quite understand the extra parameter to > test_bms_get_singleton_member() so I ended up removing it and making > the test function return -1 when the set isn't a singleton. I didn't > search for discussion about that so I might have missed the specific > reason it was done that way. It is not very far. In short, I am caring about the status returned by bms_get_singleton_member(), then translated in the output of the SQL function based on the default given in input: https://www.postgresql.org/message-id/aNolInqSzt1d1338@paquier.xyz Perhaps not the most elegant solution, but simpler than returning a setof without changing the meaning of the tests. So we could return the status directly or use the default value from the input, not doing any of these actions reduces what's reported back at SQL level. > 4. Wrote some comments to try to communicate the standards defined. > This aims to offer guidance to callers of these functions and anyone > hacking on the module itself in the future. +/* Helper macro to fetch Text parameters as Bitmapsets. SQL-NULL means empty set */ +#define PG_ARG_GETBITMAPSET(n) \ + (PG_ARGISNULL(n) ? NULL : TEXT_TO_BITMAPSET(PG_GETARG_TEXT_PP(n))) True that embedding the PG_ARGISNULL() call may be better in a macro, as passing NULL values to the internal routines is OK. > I also wonder how worthwhile it is to try to free the Bitmapsets. I'd > have expected these allocations to be in a pretty short-lived memory > context. I count 44 calls to bms_free(). I've wondered about that as well. However, I have concluded to leave the free calls arounds, for the memory context argument if someone plays more with this module. I don't mind the removal of the NULL checks when calling bms_free() to cut lines in the module, though. I was also seeing an argument with cross-checking the frees after a REALLOCATE_BITMAPSETS copy, as well, for sanity checks. So I would rather leave them as they are, not remove them. Again, that may come up to one's view on the matter and one's philosophy around these modules. -- Michael
Вложения
On Wed, 1 Oct 2025 at 12:37, Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 01, 2025 at 12:00:39PM +1300, David Rowley wrote: > > 1. Added helper macros PG_ARG_GETBITMAPSET() and > > PG_RETURN_BITMAPSET_AS_TEXT(). These themselves save a few hundred > > lines of code and make the functions easier to read. > > +#define PG_RETURN_BITMAPSET_AS_TEXT(bms) \ > + do { \ > + text *result = BITMAPSET_TO_TEXT(bms); \ > + bms_free(bms); \ > + PG_RETURN_TEXT_P(result); \ > + } while (0); > > This is also something I've considered as embedding in a macro, and > discarded it. The reason was code readability, because it gives to > the reader a quick knowledge of what can be freed once a call to a > specific bitmapset.c routine is done, acting as a kind of reference > template. Here, the free call knowledge gets hidden. test_bms_copy() > becomes more confusing to me. This argument comes down to one's taste > and philosophy on the matter of test modules. These are for me tools > of coverage, but also can work as reference templates when calling an > internal routine because we also document some of its implied > behaviors, like the inputs recycled. This sounds like another argument towards getting rid of the bms_frees from the module. > > 2. Standardised that the test function returns NULL when it receives > > invalid input. e.g if we aim to pass an "int" to the underlying > > Bitmapset function, if someone calls the test function with an SQL > > NULL, just return NULL rather than trying to come up with some logic > > on what we should do about that. > > +static const char *emptyset = "(b)"; > > Not sure that I'm on board with hardcoding this knowledge in the test > module. But well, if you feel strongly about it, I'm not going to put > a fight for it. I'm not a huge fan of that part either. I didn't feel strongly enough to change it when adjusting Greg's v2. For me, I don't see the problem with leaving it up to nodeToString(), which will return "<>" for an empty set. > > 3. I couldn't quite understand the extra parameter to > > test_bms_get_singleton_member() so I ended up removing it and making > > the test function return -1 when the set isn't a singleton. I didn't > > search for discussion about that so I might have missed the specific > > reason it was done that way. > > It is not very far. In short, I am caring about the status returned > by bms_get_singleton_member(), then translated in the output of the > SQL function based on the default given in input: > https://www.postgresql.org/message-id/aNolInqSzt1d1338@paquier.xyz > > Perhaps not the most elegant solution, but simpler than returning a > setof without changing the meaning of the tests. So we could return > the status directly or use the default value from the input, not doing > any of these actions reduces what's reported back at SQL level. Thanks for explaining. Is anything being lost with the -1 return? Since that's not a valid member for a set, -1 can only be returned if bms_get_singleton_member() returned false. > > 4. Wrote some comments to try to communicate the standards defined. > > This aims to offer guidance to callers of these functions and anyone > > hacking on the module itself in the future. > > +/* Helper macro to fetch Text parameters as Bitmapsets. SQL-NULL means empty set */ > +#define PG_ARG_GETBITMAPSET(n) \ > + (PG_ARGISNULL(n) ? NULL : TEXT_TO_BITMAPSET(PG_GETARG_TEXT_PP(n))) > > True that embedding the PG_ARGISNULL() call may be better in a macro, > as passing NULL values to the internal routines is OK. OK good. There's a decent code reduction from this one. > > I also wonder how worthwhile it is to try to free the Bitmapsets. I'd > > have expected these allocations to be in a pretty short-lived memory > > context. I count 44 calls to bms_free(). > > I've wondered about that as well. However, I have concluded to leave > the free calls arounds, for the memory context argument if someone > plays more with this module. I don't mind the removal of the NULL > checks when calling bms_free() to cut lines in the module, though. I > was also seeing an argument with cross-checking the frees after a > REALLOCATE_BITMAPSETS copy, as well, for sanity checks. So I would > rather leave them as they are, not remove them. Again, that may come > up to one's view on the matter and one's philosophy around these > modules. Because you're not happy with embedding the bms_free() in with PG_RETURN_BITMAPSET_AS_TEXT(), why don't we just get rid of all the bms_free's? It's just a test module. If anyone ever comes complaining about memory usage then we can reconsider. I think it's nice to keep these wrappers as thin as possible in terms of the amount of C code. Having conditional logic in them, IMO, risk hiding the true behaviour of the underlying code that's being tested. I know I've just come in here like a bull in a china shop. It's not my intention to imply that anything is wrong with the code. I just put together my thoughts in the hopes that it can be made even better. David
On Wed, Oct 01, 2025 at 01:07:10PM +1300, David Rowley wrote: > On Wed, 1 Oct 2025 at 12:37, Michael Paquier <michael@paquier.xyz> wrote: >> This is also something I've considered as embedding in a macro, and >> discarded it. The reason was code readability, because it gives to >> the reader a quick knowledge of what can be freed once a call to a >> specific bitmapset.c routine is done, acting as a kind of reference >> template. Here, the free call knowledge gets hidden. test_bms_copy() >> becomes more confusing to me. This argument comes down to one's taste >> and philosophy on the matter of test modules. These are for me tools >> of coverage, but also can work as reference templates when calling an >> internal routine because we also document some of its implied >> behaviors, like the inputs recycled. > > This sounds like another argument towards getting rid of the bms_frees > from the module. With the RETURN macro in mind, which itself cuts quite a little bit of code, I'd rather do as you are suggesting and remove the free calls, then. Embedding this knowledge in the return macro seems not much adapted to me, but I don't object to this macro idea per-se. > I'm not a huge fan of that part either. I didn't feel strongly enough > to change it when adjusting Greg's v2. For me, I don't see the problem > with leaving it up to nodeToString(), which will return "<>" for an > empty set. I think it's fine to leave it to nodeToString(). It is able to handle a NULL input on its own. > Thanks for explaining. Is anything being lost with the -1 return? > Since that's not a valid member for a set, -1 can only be returned if > bms_get_singleton_member() returned false. + if (!bms_get_singleton_member(bms, &member)) + member = -1; Hmm. On second look, if you do that, there's no knowledge lost, I guess. > Because you're not happy with embedding the bms_free() in with > PG_RETURN_BITMAPSET_AS_TEXT(), why don't we just get rid of all the > bms_free's? It's just a test module. If anyone ever comes complaining > about memory usage then we can reconsider. I think it's nice to keep > these wrappers as thin as possible in terms of the amount of C code. > Having conditional logic in them, IMO, risk hiding the true behaviour > of the underlying code that's being tested. Okay. Let's just remove as much code as possible, then. This is roughly what you have sent, minus the hardcoded empty set and the free calls. I'll just go do that in a bit. -- Michael
Вложения
On Wed, 1 Oct 2025 at 16:41, Michael Paquier <michael@paquier.xyz> wrote: > Okay. Let's just remove as much code as possible, then. This is > roughly what you have sent, minus the hardcoded empty set and the free > calls. I'll just go do that in a bit. Thanks. Just for reference, you may not need it if you're mostly done already, but I had to finish this off in my head before moving on. Feel free to ignore, but there are a couple of typos fixed at the very least "varrying" should be "varying". I didn't fully rationalise the tests in test_bitmapset.sql. There are still a few which are duplicates due to '(b)' and NULL meaning the same thing. git diff --stat .../test_bitmapset/expected/test_bitmapset.out | 400 ++------------ .../modules/test_bitmapset/sql/test_bitmapset.sql | 101 +--- .../modules/test_bitmapset/test_bitmapset--1.0.sql | 2 +- src/test/modules/test_bitmapset/test_bitmapset.c | 605 +++++---------------- 4 files changed, 222 insertions(+), 886 deletions(-) I didn't look at the coverage report, but on thinking about it, test_bitmapset.c's "PG_RETURN_NULL(); /* invalid input */" won't be covered as I stripped out all the tests which purposefully pass invalid input. You may not want it that way if you're keen to keep the coverage of the test module high as well as bitmapset.c David
On Wed, 1 Oct 2025 at 18:00, David Rowley <dgrowleyml@gmail.com> wrote: > git diff --stat > .../test_bitmapset/expected/test_bitmapset.out | 400 ++------------ > .../modules/test_bitmapset/sql/test_bitmapset.sql | 101 +--- > .../modules/test_bitmapset/test_bitmapset--1.0.sql | 2 +- > src/test/modules/test_bitmapset/test_bitmapset.c | 605 +++++---------------- > 4 files changed, 222 insertions(+), 886 deletions(-) And once more, with the patch this time around... David
Вложения
On Wed, Oct 01, 2025 at 06:00:59PM +1300, David Rowley wrote: > I didn't look at the coverage report, but on thinking about it, > test_bitmapset.c's "PG_RETURN_NULL(); /* invalid input */" won't be > covered as I stripped out all the tests which purposefully pass > invalid input. You may not want it that way if you're keen to keep the > coverage of the test module high as well as bitmapset.c Yes, that's something I was just going through, with a result pretty close to what you have sent, including the removal and replacement of most of tests that were specific to the test module. The prev/next member coverage is still looking OK from here, though. Thanks for the typos, I've missed these. -- Michael
Вложения
On Oct 1 2025, at 1:14 am, Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Oct 01, 2025 at 06:00:59PM +1300, David Rowley wrote: >> I didn't look at the coverage report, but on thinking about it, >> test_bitmapset.c's "PG_RETURN_NULL(); /* invalid input */" won't be >> covered as I stripped out all the tests which purposefully pass >> invalid input. You may not want it that way if you're keen to keep the >> coverage of the test module high as well as bitmapset.c > > Yes, that's something I was just going through, with a result pretty > close to what you have sent, including the removal and replacement of > most of tests that were specific to the test module. The prev/next > member coverage is still looking OK from here, though. > > Thanks for the typos, I've missed these. > -- > Michael Michael, David, Thank you. :) The coverage[1] looks to be 100% for everything in bitmapset.c now and 98.6% of the test code[2] for Bitmapset -- more than enough and much better than before. I do like the cleanup/simplification that you two jointly applied. I think it's easier to maintain and reason about and looking at the final result I can see the logic in the simplified approach. I've learned a lot from this exercise not only about Bitmapset and various other areas in the code but also the approach/model for a task like this. I'll keep that in mind going forward. best. -greg [1] https://coverage.postgresql.org/src/backend/nodes/bitmapset.c.gcov.html [2] https://coverage.postgresql.org/src/test/modules/test_bitmapset/test_bitmapset.c.gcov.html
> On 1 Oct 2025, at 07:14, Michael Paquier <michael@paquier.xyz> wrote: > Thanks for the typos, I've missed these. Doing post-commit review I didn't see any sharp edges, but found one more of these so will push this shortly: --- Substraction to empty +-- Subtraction to empty -- Daniel Gustafsson
On Oct 2 2025, at 6:00 am, Daniel Gustafsson <daniel@yesql.se> wrote: >> On 1 Oct 2025, at 07:14, Michael Paquier <michael@paquier.xyz> wrote: > >> Thanks for the typos, I've missed these. > > Doing post-commit review I didn't see any sharp edges, but found one > more of > these so will push this shortly: > > --- Substraction to empty > +-- Subtraction to empty +1, thanks. I'll be more careful in the future. > -- > Daniel Gustafsson best. -greg
On Thu, 2 Oct 2025 at 23:00, Daniel Gustafsson <daniel@yesql.se> wrote: > Doing post-commit review I didn't see any sharp edges, but found one more of > these so will push this shortly: Any chance you could also delete the "/* memory cleanup seems more tricky than it's worth here */" line? That comment seemed relevant when we were actually doing bms_free() in the functions. Now that we're not, it's a bit out of place. David
> On 2 Oct 2025, at 14:11, David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 2 Oct 2025 at 23:00, Daniel Gustafsson <daniel@yesql.se> wrote: >> Doing post-commit review I didn't see any sharp edges, but found one more of >> these so will push this shortly: > > Any chance you could also delete the "/* memory cleanup seems more > tricky than it's worth here */" line? That comment seemed relevant > when we were actually doing bms_free() in the functions. Now that > we're not, it's a bit out of place. Sure I can take care of that while in there. Another nitpick would be to remove the test for NULL in test_bms_make_singleton since that is a STRICT function, making the test for NULL superfluous code: diff --git a/src/test/modules/test_bitmapset/test_bitmapset.c b/src/test/modules/test_bitmapset/test_bitmapset.c index 0d6c2e7aa1b..acaa93d2f11 100644 --- a/src/test/modules/test_bitmapset/test_bitmapset.c +++ b/src/test/modules/test_bitmapset/test_bitmapset.c @@ -201,9 +201,6 @@ test_bms_make_singleton(PG_FUNCTION_ARGS) Bitmapset *bms; int32 member; - if (PG_ARGISNULL(0)) - PG_RETURN_NULL(); /* invalid input */ - member = PG_GETARG_INT32(0); bms = bms_make_singleton(member); I'll include all of these three tiny improvements in a commit later today. -- Daniel Gustafsson
On Fri, 3 Oct 2025 at 01:33, Daniel Gustafsson <daniel@yesql.se> wrote: > Another nitpick would be to remove the test for NULL in test_bms_make_singleton > since that is a STRICT function, making the test for NULL superfluous code: I didn't notice that was marked as strict. It would be good to get rid of those lines in that case. Thanks David
> On 2 Oct 2025, at 23:00, David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 3 Oct 2025 at 01:33, Daniel Gustafsson <daniel@yesql.se> wrote: >> Another nitpick would be to remove the test for NULL in test_bms_make_singleton >> since that is a STRICT function, making the test for NULL superfluous code: > > I didn't notice that was marked as strict. It would be good to get rid > of those lines in that case. Done, comment cleanup and nullness check pushed. -- Daniel Gustafsson
On Fri, 3 Oct 2025 at 01:33, Daniel Gustafsson <daniel@yesql.se> wrote: > Another nitpick would be to remove the test for NULL in test_bms_make_singleton > since that is a STRICT function, making the test for NULL superfluous code: I see test_random_operations() is also strict. Is it worth getting rid of the SQL NULL checks on the inputs there too? Aka, the attached. David
Вложения
On Fri, Oct 03, 2025 at 12:36:57PM +1300, David Rowley wrote: > I see test_random_operations() is also strict. Is it worth getting rid > of the SQL NULL checks on the inputs there too? Aka, the attached. Yes, this was not bothering me much. Anyway, agreed about removing these as well. I cannot find an argument in favor of making the function callable on NULL input with default values that are hardcoded internally in the function. Thanks, David and Daniel. -- Michael