Обсуждение: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
Hello, I've been working on Bitmapset and while creating a test suite for it I found that there is a missing bounds check in bms_prev_member(). The function takes the prevbit argument and converts it to an index into the words array using WORDNUM() without checking to ensure that prevbit is within the bounds of the possible values (e.g. nwords * BITS_PER_BITMAPWORD) in the set. This means that $subject resulting in a confusing return value when the expected value should be the highest bit set. The patch attached adds a bounds check preventing this. -greg
Вложения
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
David Rowley
Дата:
On Fri, 15 Aug 2025 at 01:21, Greg Burd <greg@burd.me> wrote: > I've been working on Bitmapset and while creating a test suite for it I > found that there is a missing bounds check in bms_prev_member(). The > function takes the prevbit argument and converts it to an index into the > words array using WORDNUM() without checking to ensure that prevbit is > within the bounds of the possible values (e.g. nwords * > BITS_PER_BITMAPWORD) in the set. This means that $subject resulting in > a confusing return value when the expected value should be the highest > bit set. There's a comment saying: * "prevbit" must NOT be more than one above the highest possible bit that can * be set at the Bitmapset at its current size. So looks like it's the fault of the calling code and not an issue with bms_prev_member(). David
David Rowley <dgrowleyml@gmail.com> writes: > There's a comment saying: > * "prevbit" must NOT be more than one above the highest possible bit that can > * be set at the Bitmapset at its current size. > So looks like it's the fault of the calling code and not an issue with > bms_prev_member(). Yeah. But perhaps it'd be reasonable to put in an Assert checking that? Grammar nitpick: comment should be more like "set in the Bitmapset". regards, tom lane
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
David Rowley
Дата:
On Fri, 15 Aug 2025 at 02:06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <dgrowleyml@gmail.com> writes: > > So looks like it's the fault of the calling code and not an issue with > > bms_prev_member(). > > Yeah. But perhaps it'd be reasonable to put in an Assert checking > that? Seems like a good idea. > Grammar nitpick: comment should be more like "set in the Bitmapset". Agreed. David
On Aug 14 2025, at 9:46 am, David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, 15 Aug 2025 at 01:21, Greg Burd <greg@burd.me> wrote: >> I've been working on Bitmapset and while creating a test suite for it I >> found that there is a missing bounds check in bms_prev_member(). The >> function takes the prevbit argument and converts it to an index into the >> words array using WORDNUM() without checking to ensure that prevbit is >> within the bounds of the possible values (e.g. nwords * >> BITS_PER_BITMAPWORD) in the set. This means that $subject resulting in >> a confusing return value when the expected value should be the highest >> bit set. > > There's a comment saying: > > * "prevbit" must NOT be more than one above the highest possible bit > that can > * be set at the Bitmapset at its current size. > > So looks like it's the fault of the calling code and not an issue with > bms_prev_member(). > > David Hi David, thanks for your feedback. You're right, there is that statement in the comment. There are three paths that I can see: 1) forget about it and let the comment do the work, caller beware 2) add an assert that ensures we catch errant callers and leave the comment as is 3) add the test and adjust the value of prevbit to be in bounds and remove the comment I'm not in favor of (1) because I think adding either (2) or (3) prevent, or help identify, unintentional bugs. I guess I'm a bit overly cautious when I find code that can read outside the bounds of its intended memory returning seemingly valid results or in other cases potentially reading invalid memory addresses and raise a signal. That just feels wrong. My vote now is (2) in the attached patch with a bit fix also in the comment. my $0.02. :) -greg
Вложения
On Aug 14 2025, at 10:06 am, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> There's a comment saying: >> * "prevbit" must NOT be more than one above the highest possible bit >> that can >> * be set at the Bitmapset at its current size. >> So looks like it's the fault of the calling code and not an issue with >> bms_prev_member(). > > Yeah. But perhaps it'd be reasonable to put in an Assert checking > that? Hey Tom, thanks for chiming in on this one. I had the same idea and sent an updated patch. -greg > Grammar nitpick: comment should be more like "set in the Bitmapset". I found this too, and also the "one above" part seemed wrong to me as well. > > regards, tom lane best. -greg
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
David Rowley
Дата:
On Fri, 15 Aug 2025 at 02:21, Greg Burd <greg@burd.me> wrote: > I found this too, and also the "one above" part seemed wrong to me as well. It is valid to pass prevbit as a->nwords * BITS_PER_BITMAPWORD as the code does "prevbit--;". Maybe it would be less confusing if it were written as: * "prevbit" must be less than or equal to "a->nwords * BITS_PER_BITMAPWORD". The Assert should be using <= rather than <. David
David Rowley <dgrowleyml@gmail.com> writes: > It is valid to pass prevbit as a->nwords * BITS_PER_BITMAPWORD as the > code does "prevbit--;". Maybe it would be less confusing if it were > written as: > * "prevbit" must be less than or equal to "a->nwords * BITS_PER_BITMAPWORD". > The Assert should be using <= rather than <. Actually, I don't agree with that. It's true that it wouldn't fail, but a caller doing that is exhibiting undue intimacy with the innards of Bitmapsets. The expected usage is that the argument is initially -1 and after that the result of the previous call (which'll necessarily be less than a->nwords * BITS_PER_BITMAPWORD). We don't have any state with which we can verify the chain of calls, but it seems totally reasonable to me to disallow an outside caller providing an argument >= a->nwords * BITS_PER_BITMAPWORD. regards, tom lane
On Aug 14 2025, at 11:14 am, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> It is valid to pass prevbit as a->nwords * BITS_PER_BITMAPWORD as the >> code does "prevbit--;". Maybe it would be less confusing if it were >> written as: >> * "prevbit" must be less than or equal to "a->nwords * BITS_PER_BITMAPWORD". >> The Assert should be using <= rather than <. > > Actually, I don't agree with that. It's true that it wouldn't fail, > but a caller doing that is exhibiting undue intimacy with the innards > of Bitmapsets. The expected usage is that the argument is initially > -1 and after that the result of the previous call (which'll > necessarily be less than a->nwords * BITS_PER_BITMAPWORD). We don't > have any state with which we can verify the chain of calls, but it > seems totally reasonable to me to disallow an outside caller > providing an argument >= a->nwords * BITS_PER_BITMAPWORD. > > regards, tom lane Thanks Tom, David, Seems I also forgot about the case where the Bitmapset passed is NULL. The new assert needs to handle that as well. -greg
Вложения
On Aug 14 2025, at 11:37 am, Greg Burd <greg@burd.me> wrote: > > On Aug 14 2025, at 11:14 am, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> David Rowley <dgrowleyml@gmail.com> writes: >>> It is valid to pass prevbit as a->nwords * BITS_PER_BITMAPWORD as the >>> code does "prevbit--;". Maybe it would be less confusing if it were >>> written as: >>> * "prevbit" must be less than or equal to "a->nwords * BITS_PER_BITMAPWORD". >>> The Assert should be using <= rather than <. >> >> Actually, I don't agree with that. It's true that it wouldn't fail, >> but a caller doing that is exhibiting undue intimacy with the innards >> of Bitmapsets. The expected usage is that the argument is initially >> -1 and after that the result of the previous call (which'll >> necessarily be less than a->nwords * BITS_PER_BITMAPWORD). We don't >> have any state with which we can verify the chain of calls, but it >> seems totally reasonable to me to disallow an outside caller >> providing an argument >= a->nwords * BITS_PER_BITMAPWORD. >> >> regards, tom lane > > > Thanks Tom, David, > > Seems I also forgot about the case where the Bitmapset passed is NULL. > The new assert needs to handle that as well. > > -greg Well, that was rushed. Apologies. -greg
Вложения
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
David Rowley
Дата:
On Fri, 15 Aug 2025 at 03:43, Greg Burd <greg@burd.me> wrote: > Well, that was rushed. Apologies. Would you be ok with adding the Assert after the "a == NULL" check?, i.e: if (a == NULL || prevbit == 0) return -2; /* Validate callers didn't give us something out of range */ Assert(prevbit <= a->nwords * BITS_PER_BITMAPWORD); David
Greg Burd <greg@burd.me> writes: > Well, that was rushed. Apologies. I was thinking something more like /* transform -1 to the highest possible bit we could have set */ if (prevbit == -1) prevbit = a->nwords * BITS_PER_BITMAPWORD - 1; else + { + Assert(prevbit > 0 && prevbit < a->nwords * BITS_PER_BITMAPWORD); prevbit--; + } Admittedly, this doesn't bother to check sanity of prevbit when a == NULL, but I don't think doing so is useful enough to contort the logic for it. regards, tom lane
On Aug 14 2025, at 11:49 am, David Rowley <dgrowleyml@gmail.com> wrote: > On Fri, 15 Aug 2025 at 03:43, Greg Burd <greg@burd.me> wrote: >> Well, that was rushed. Apologies. > > Would you be ok with adding the Assert after the "a == NULL" check?, i.e: > > if (a == NULL || prevbit == 0) > return -2; > > /* Validate callers didn't give us something out of range */ > Assert(prevbit <= a->nwords * BITS_PER_BITMAPWORD); > > David Good thinking, less contorted logic and a more obvious check for NULL and the result to expect in that case. -greg
Вложения
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
David Rowley
Дата:
On Fri, 15 Aug 2025 at 03:14, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > It is valid to pass prevbit as a->nwords * BITS_PER_BITMAPWORD as the > > code does "prevbit--;". Maybe it would be less confusing if it were > > written as: > > * "prevbit" must be less than or equal to "a->nwords * BITS_PER_BITMAPWORD". > > The Assert should be using <= rather than <. > > Actually, I don't agree with that. It's true that it wouldn't fail, > but a caller doing that is exhibiting undue intimacy with the innards > of Bitmapsets. The expected usage is that the argument is initially > -1 and after that the result of the previous call (which'll > necessarily be less than a->nwords * BITS_PER_BITMAPWORD). We don't > have any state with which we can verify the chain of calls, but it > seems totally reasonable to me to disallow an outside caller > providing an argument >= a->nwords * BITS_PER_BITMAPWORD. I can't get on board with this way of thinking. I'm happy enough that we add an Assert, but if we're going to do that then the Assert should be coded in a way that's aligned to the actual restriction that we're trying to protect against. I've heard people arguing before that Asserts can act as documentation of what are valid values for a given function parameter. If that's true, then we should code the Assert so it checks for *valid* values, not 1 less than a valid value. IMO, doing it any other way is likely to cause confusion in people who want to use the function, or bug reports that we're off by 1 in the Assert. David
Tom, David, I really appreciate the time spent and the thoughtful replies on this thread today. Honestly, I thought this would be a nice and easy single line fix ahead of a proposal for the addition of tests for Bitmapset[1] (which identified the issue) I think, with v5, we're fixing this small issue in a reasonable and understandable manner. Any chance we can agree on it and get back to the bigger fish in the frying pan? :) Spoiler alert, I have larger plans[2] for Bitmapset which is why I started down this road to begin with as I wanted to capture the existing API/contract for it before I proposed changes to it. best, -greg [1] https://github.com/gburd/postgres/pull/11/commits/0bd90e6ce7954d4324748f18717508cce64afc22 [2] https://codeberg.org/gregburd/sparsemap - a compressed bitmap index
Greg Burd <greg@burd.me> writes: > Spoiler alert, I have larger plans[2] for Bitmapset which is why I > started down this road to begin with as I wanted to capture the existing > API/contract for it before I proposed changes to it. FWIW, I'd be seriously resistant to simply replacing Bitmapset with something like that, as I think the present implementation is about right for the planner's uses and complicating it would just slow things down. There may be specific places where a compressed implementation could win, though. Anyway, that's off-topic for the present thread. I believe it's middle-of-the-night in Rowley's time zone, so I was waiting for further comment from him before taking any action. regards, tom lane
> On Aug 14, 2025, at 3:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Greg Burd <greg@burd.me> writes: >> Spoiler alert, I have larger plans[2] for Bitmapset which is why I >> started down this road to begin with as I wanted to capture the existing >> API/contract for it before I proposed changes to it. > > FWIW, I'd be seriously resistant to simply replacing Bitmapset with > something like that, as I think the present implementation is about > right for the planner's uses and complicating it would just slow > things down. There may be specific places where a compressed > implementation could win, though. Well, I get that and I know I’m battling uphill. :) Can’t hurt to measure and see if there is a benefit or not. If not, I’ll just propose a different Sparsemapset (or other reasonable name) in addition to the existing Bitmapset for the use case that pushed me in this direction. No reason not to explore options, right? > Anyway, that's off-topic for the present thread. I believe it's > middle-of-the-night in Rowley's time zone, so I was waiting for > further comment from him before taking any action. Makes perfect sense, sorry to rush the process. > regards, tom lane best, -greg
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
David Rowley
Дата:
On Fri, 15 Aug 2025 at 07:45, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway, that's off-topic for the present thread. I believe it's > middle-of-the-night in Rowley's time zone, so I was waiting for > further comment from him before taking any action. The v5 patch looks good to me. FWIW, after sleeping, I'm now very much against using < rather than <= for the Assert. The reason being that it makes it impossible to build bms_prev_member() loops with a dynamic start point. Right now we document that we expect the loop to be started with -1, but if someone wants to start at some arbitrary point in the set, then they need to be able to pass some_member + 1. If some_member happens to be the highest bit in the last word then your Assert will fail for no good reason. I'm happy to push Greg's v5 patch if you have no counterarguments. David
David Rowley <dgrowleyml@gmail.com> writes: > FWIW, after sleeping, I'm now very much against using < rather than <= > for the Assert. The reason being that it makes it impossible to build > bms_prev_member() loops with a dynamic start point. Right now we > document that we expect the loop to be started with -1, but if someone > wants to start at some arbitrary point in the set, then they need to > be able to pass some_member + 1. If some_member happens to be the > highest bit in the last word then your Assert will fail for no good > reason. Hm. So the use-case you're imagining is "I know that N is a member of this set, and I want to iterate through N as well as all smaller members of this set"? I guess it's arguably possible, but I'm dubious. We have exactly one caller of this function at the moment, so it's hard to construct any sweeping arguments about what people might want to do with it. But I'd be inclined to read the use-cases narrowly not broadly. > I'm happy to push Greg's v5 patch if you have no counterarguments. In the end this isn't something I find worth arguing about. If you prefer v5, sure. I do suggest though that if we're installing Asserts at all, defending against prevbit < -1 is worth doing. regards, tom lane
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
David Rowley
Дата:
On Fri, 15 Aug 2025 at 15:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I'm happy to push Greg's v5 patch if you have no counterarguments. > > In the end this isn't something I find worth arguing about. If > you prefer v5, sure. I do suggest though that if we're installing > Asserts at all, defending against prevbit < -1 is worth doing. Agreed about defending against prevbit < -1. I added an Assert for that. Technically, that Assert could be up above the if (a == NULL) check, but I didn't think it mattered that much and opted to keep both Asserts together. The difference being that bms_prev_member(NULL, -2) will return -2 rather than Assert fail. I'm not too worried about that, but if you feel strongly differently, I can adjust what I just pushed. David
Re: [PATCH] bms_prev_member() can read beyond the end of the array of allocated words
От
"Burd, Greg"
Дата:
> On Aug 15, 2025, at 12:37 AM, David Rowley <dgrowleyml@gmail.com> wrote: > > On Fri, 15 Aug 2025 at 15:24, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> David Rowley <dgrowleyml@gmail.com> writes: >>> I'm happy to push Greg's v5 patch if you have no counterarguments. >> >> In the end this isn't something I find worth arguing about. If >> you prefer v5, sure. I do suggest though that if we're installing >> Asserts at all, defending against prevbit < -1 is worth doing. Agreed, that's a reasonable addition. > Agreed about defending against prevbit < -1. I added an Assert for > that. Technically, that Assert could be up above the if (a == NULL) > check, but I didn't think it mattered that much and opted to keep both > Asserts together. The difference being that bms_prev_member(NULL, -2) > will return -2 rather than Assert fail. I'm not too worried about > that, but if you feel strongly differently, I can adjust what I just > pushed. Thanks for adding that Assert. I don't think it's worth relocating at this time. I'll post the addition of tests in a new thread. I hope you both have interest and time to review that as well. > David Thanks David for pushing the commit! :) best. -greg