Re: Bug in GiST paring heap comparator
От | Heikki Linnakangas |
---|---|
Тема | Re: Bug in GiST paring heap comparator |
Дата | |
Msg-id | 7362fcac-3664-d960-f0fa-77c24e62a7b3@iki.fi обсуждение исходный текст |
Ответ на | Bug in GiST paring heap comparator (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Ответы |
Re: Bug in GiST paring heap comparator
|
Список | pgsql-hackers |
On 02/09/2019 07:54, Alexander Korotkov wrote: > Andrey Borodin noticed me that results of some KNN-GIST tests are > obviously wrong and don't match results of sort node. > > SELECT * FROM point_tbl ORDER BY f1 <-> '0,1'; > f1 > ------------------- > (10,10) > (NaN,NaN) > (0,0) > (1e-300,-1e-300) > (-3,4) > (-10,0) > (-5,-12) > (5.1,34.5) > > (1e+300,Infinity) > (10 rows) So we've memorized incorrect results in the expected output. Ouch! > It appears to be related to implementation of comparison function in > pairing heap used as priority queue for KNN. It used plain float > comparison, which doesn't handle Inf and Nan values well. Attached > patch replaced it with float8_cmp_internal(). > > Also, note that with patch KNN results still don't fully match results > of sort node. > > SELECT * FROM point_tbl ORDER BY f1 <-> '0,1'; > f1 > ------------------- > (0,0) > (1e-300,-1e-300) > (-3,4) > (-10,0) > (10,10) > (-5,-12) > (5.1,34.5) > (1e+300,Infinity) > > (NaN,NaN) > (10 rows) > > NULL and '(NaN,NaN)' are swapped. It happens so, because we assume > distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to > be greater than NULL. If even we would assume distance to NULL to be > Nan, it doesn't guarantee that NULLs would be last. It looks like we > can handle this only by introduction array of "distance is null" flags > to GISTSearchItem. But does it worth it? I don't think we have much choice, returning wrong results is not an option. At first I thought we could set the "recheck" flag if we see NULLs or NaNs, but that won't work because rechecking is not supported with Index-Only Scans. Looking at the corresponding SP-GiST code, pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest copy-pasting the implementation from there, so that they would be as similar as possible. (SP-GiST gets away with just one isNull flag, because it doesn't support multi-column indexes. In GiST, you need an array, as you said.) - Heikki
В списке pgsql-hackers по дате отправления: