Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
От | Heikki Linnakangas |
---|---|
Тема | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Дата | |
Msg-id | 69851f03-d44d-1d90-b174-836ad308e3c9@iki.fi обсуждение исходный текст |
Ответ на | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows (Alexander Korotkov <aekorotkov@gmail.com>) |
Список | pgsql-bugs |
On 11/07/2021 00:56, Alexander Korotkov wrote: > I've closer look at shimTriConsistentFn() function. It looks to me > like the function itself has inconsistencies. But the way it's > currently used in GIN shouldn't produce the wrong query answers. > > shimTriConsistentFn() is one of implementation of > GinScanKey.triConsistentFn. In turn, GinScanKey.triConsistentFn have > 3 callers: > 1) startScanKey() to determine required keys > 2) keyGetItem() for lossy item check > 3) keyGetItem() for normal item check > > Let's see shimTriConsistentFn() inconsistencies and how callers handle them. > 1) shimTriConsistentFn() returns result of directBoolConsistentFn() > for zero maybe entries without examining key->recheckCurItem. That > may result in returning GIN_TRUE instead of GIN_MAYBE > 1.1) startScanKey() doesn't care about recheck, just looking for > GIN_FALSE result. > 1.2) keyGetItem() for lossy item always implies recheck. > 1.3) keyGetItem() for a normal item does the trick. Whether a current > item is rechecked is controlled by key->recheckCurItem variable (the > same which is set by directBoolConsistentFn(). The following switch > sets key->recheckCurItem for GIN_MAYBE, but leaves it untouched for > GIN_TRUE. Thus, GIN_TRUE with key->recheckCurItem works here just > like GIN_MAYBE. I think this is inconsistent by itself, but this > inconsistency compensates for inconsistency in shimTriConsistentFn(). > 2) shimTriConsistentFn() might call directBoolConsistentFn() with all > false inputs for GIN_SEARCH_MODE_DEFAULT. The result is intended to > be false, but opclass consistent method isn't intended to handle this > situation correctly. For instance, ginint4_consistent() returns true > without checking the input array. That could make > shimTriConsistentFn() return GIN_TRUE instead of GIN_MAYBE, or > GIN_MAYBE instead of GIN_FALSE. > 2.1) In principle, this could lead startScanKey() to select less > required entries than possible. But this is definitely not the case > of ginint4_consistent() when meeting any of entries is enough for > match. > 2.2) In principle, keyGetItem() could get false positives for lossy > items. But that wouldn't lead to wrong query answers. Again, this is > not the case of ginint4_consistent(). > 2.3) keyGetItem() for a normal item doesn't call triConsistentFn() > with any GIN_MAYBE or all GIN_FALSE. > > To sum up, I don't see how inconsistencies in shimTriConsistentFn() > could lead to wrong query answers, especially in intarray GIN index. Agreed, I came to the same conclusion looking at the code. Which means that we still don't have an explanation for the original bug report :-(. > Nevertheless, these inconsistencies should be fixed. I'm going to > propose a patch soon. Thanks! I came up with the attached patch. I changed directBoolConsistentFn() to return a GinTernaryValue rather than bool, I think that makes the logic in shimTriConsistentFn() more clear. I also tried to write a test case to expose issue 2.1 above, but couldn't come up with an example. - Heikki
Вложения
В списке pgsql-bugs по дате отправления: