Обсуждение: BUG #6278: Index scans on '>' condition on field with many NULLS
The following bug has been logged online: Bug reference: 6278 Logged by: Maksym Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 9.0 Operating system: Linux Description: Index scans on '>' condition on field with many NULLS Details: I not sure it is missing feature or actual bug. However very selective index scan on '>' condition can work pretty inefficient on column with many nulls. (in the same time '<' work well). Seems index scan on '>' condition going through all nulls in index. Test case (tested on 8.4 and 9.0 with same effect): postgres=# CREATE table test as select (case when random()>0.1 then NULL else random() end) as value from generate_series(1,10000000); SELECT 10000000 postgres=# CREATE INDEX test_value_key on test(value); CREATE INDEX postgres=# SELECT count(*) from test; count ---------- 10000000 (1 row) postgres=# VACUUM ANALYZE test; VACUUM postgres=# EXPLAIN ANALYZE select * from test where value>0.9999; QUERY PLAN ---------------------------------------------------------------------------- ----------------------------------------------- Index Scan using test_value_key on test (cost=0.00..13.78 rows=105 width=8) (actual time=0.010..155.318 rows=91 loops=1) Index Cond: (value > 0.9999::double precision) Total runtime: 155.346 ms (3 rows) Oops... 160ms to return 90 rows from memory. In the same time 100 rows from other index side: postgres=# EXPLAIN ANALYZE select * from test where value<0.0001; QUERY PLAN ---------------------------------------------------------------------------- ---------------------------------------------- Index Scan using test_value_key on test (cost=0.00..15.69 rows=120 width=8) (actual time=0.006..0.158 rows=103 loops=1) Index Cond: (value < 0.0001::double precision) Total runtime: 0.175 ms (3 rows) That is good result (1000 faster then other way around). For sure that can be fixed via create index with NOT NULL predicated. But may be that problem worth small investigation. Seems index scan cannot stop after finding first NULL during scan on '>' condition, and doing scan through all 90% nulls in table.
On Sun, Oct 30, 2011 at 11:39 PM, Maksym Boguk <maxim.boguk@gmail.com> wrot= e: > However very selective index scan on '>' condition can work pretty > inefficient on column with many nulls. > (in the same time '<' work well). > > Seems index scan on '>' condition going through all nulls in index. > > Test case (tested on 8.4 and 9.0 with same effect): > > postgres=3D# =A0CREATE table test as select (case when random()>0.1 then = NULL > else random() end) as value from generate_series(1,10000000); > SELECT 10000000 > postgres=3D# CREATE INDEX test_value_key on test(value); > CREATE INDEX > postgres=3D# SELECT count(*) from test; > =A0count > ---------- > =A010000000 > (1 row) > > postgres=3D# VACUUM ANALYZE test; > VACUUM > > postgres=3D# EXPLAIN ANALYZE select * from test where value>0.9999; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY PLAN > -------------------------------------------------------------------------= --- > ----------------------------------------------- > =A0Index Scan using test_value_key on test =A0(cost=3D0.00..13.78 rows=3D= 105 > width=3D8) (actual time=3D0.010..155.318 rows=3D91 loops=3D1) > =A0 Index Cond: (value > 0.9999::double precision) > =A0Total runtime: 155.346 ms > (3 rows) > > Oops... 160ms to return 90 rows from memory. > > In the same time 100 rows from other index side: > > postgres=3D# EXPLAIN ANALYZE select * from test where value<0.0001; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0QUERY PLAN > -------------------------------------------------------------------------= --- > ---------------------------------------------- > =A0Index Scan using test_value_key on test =A0(cost=3D0.00..15.69 rows=3D= 120 > width=3D8) (actual time=3D0.006..0.158 rows=3D103 loops=3D1) > =A0 Index Cond: (value < 0.0001::double precision) > =A0Total runtime: 0.175 ms > (3 rows) > > That is good result (1000 faster then other way around). > > For sure that can be fixed via create index with NOT NULL predicated. But > may be that problem worth small investigation. > > Seems index scan cannot stop after finding first NULL during scan on '>' > condition, and doing scan through all 90% nulls in table. I can reproduce this. I'm not sure whether it's a bug either, but it sure seems less than ideal. I suppose the problem is that we are generating an index scan that starts at 0.9999 and runs through the end of the index, rather than stopping when it hits the first NULL. Not sure how much work it would be to make that happen, but I guess we'd need a second branch to the index condition to stop the scan, just as we already do for: EXPLAIN (ANALYZE, BUFFERS) select * from test where value>0.9993 and value <0.9999; --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Oct 30, 2011 at 11:39 PM, Maksym Boguk <maxim.boguk@gmail.com> wrote: >> Seems index scan cannot stop after finding first NULL during scan on '>' >> condition, and doing scan through all 90% nulls in table. > I can reproduce this. I'm not sure whether it's a bug either, but it > sure seems less than ideal. I suppose the problem is that we are > generating an index scan that starts at 0.9999 and runs through the > end of the index, rather than stopping when it hits the first NULL. I poked at this a bit last night. The reason it's happening is that the ">" key is only marked SK_BT_REQBKWD, not SK_BT_REQFWD, so _bt_checkkeys doesn't think it can stop when it hits the NULLs. Right at the moment it seems like we could mark that key with both flags, which leads to the conclusion that two flags are unnecessary and we could get by with only one direction-independent flag. Which, if memory serves, is how it used to be ... until I split the flag into two to fix some bug or other. But the regression tests still pass if you make _bt_mark_scankey_required mark any required key with both flags (which is the zeroth-order version of recombining them). So either my analysis was wrong at the time, or some later change has eliminated the need for two flags, or the regression tests aren't covering the problematic case. Will investigate further once I've absorbed some caffeine. regards, tom lane
On Mon, Oct 31, 2011 at 10:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Oct 30, 2011 at 11:39 PM, Maksym Boguk <maxim.boguk@gmail.com> w= rote: >>> Seems index scan cannot stop after finding first NULL during scan on '>' >>> condition, and doing scan through all 90% nulls in table. > >> I can reproduce this. =A0I'm not sure whether it's a bug either, but it >> sure seems less than ideal. =A0I suppose the problem is that we are >> generating an index scan that starts at 0.9999 and runs through the >> end of the index, rather than stopping when it hits the first NULL. > > I poked at this a bit last night. =A0The reason it's happening is that the > ">" key is only marked SK_BT_REQBKWD, not SK_BT_REQFWD, so _bt_checkkeys > doesn't think it can stop when it hits the NULLs. =A0Right at the moment > it seems like we could mark that key with both flags, which leads to the > conclusion that two flags are unnecessary and we could get by with only > one direction-independent flag. =A0Which, if memory serves, is how it used > to be ... until I split the flag into two to fix some bug or other. =A0But > the regression tests still pass if you make _bt_mark_scankey_required > mark any required key with both flags (which is the zeroth-order version > of recombining them). =A0So either my analysis was wrong at the time, > or some later change has eliminated the need for two flags, or the > regression tests aren't covering the problematic case. =A0Will investigate > further once I've absorbed some caffeine. I know almost nothing about this code, but it seems both flags were introduced in commit 7ccaf13a06b8e1f70b26ab049fdb4f8c8dece3f8. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I wrote: > I poked at this a bit last night. The reason it's happening is that the > ">" key is only marked SK_BT_REQBKWD, not SK_BT_REQFWD, so _bt_checkkeys > doesn't think it can stop when it hits the NULLs. Right at the moment > it seems like we could mark that key with both flags, which leads to the > conclusion that two flags are unnecessary and we could get by with only > one direction-independent flag. OK, I swapped some of this knowledge back in. The above idea would work in simple cases with non-redundant scan keys, but in general not. Consider an indexscan on x with WHERE x > 4 We'll start scanning at the x>4 boundary point and run forward. The scankey condition can only fail once we reach NULL index entries, at which point we can stop, so in this case marking this key SK_BT_REQFWD would be OK. However, consider something like WHERE x > 4 AND x > 10 where _bt_preprocess_keys is unable to determine which key is more restrictive. (This could only happen if the comparison constants are of different types and we don't have a suitable cross-type comparison operator in the opfamily. None of the standard opfamilies have such omissions, which is why there's no regression test covering this.) In such a case, _bt_first arbitrarily picks one of the keys to do the initial positioning with. If it picks x>4, then the x>10 scankey will fail until we reach index entries > 10 ... but we can't abandon the scan just because x>10 is failing. This is why we have to have direction-sensitive required-key flags. (If we were to start backing up, failure of either of the redundant keys would indeed be justification for stopping.) We could possibly do something with marking only non-redundant >/>= scankeys as SK_BT_REQFWD, and conversely for </<= keys. But this looks like it'd be a pain to manage in _bt_preprocess_keys, and given the difficulty of testing cases where it matters, I'm not excited about going that direction. What I'm currently thinking is that we could hack _bt_checkkeys to stop the scan when it reaches NULLs if the current scankey is not an ISNULL test and is marked *either* SK_BT_REQFWD or SK_BT_REQBKWD. In such a case we know that the scankey cannot be satisfied by a NULL, so it will fail until we reach the next range of this index column, ie move to the next value of the next higher-order column (if any). And that value can't pass the scankeys, because it must have an equality constraint, else this key wouldn't be marked required. BTW, I had thought that Maksym could work around this with something like WHERE value > 0.9999 and value < 1.1 or WHERE value > 0.9999 and value is not null but it turns out that neither of those work. The second key will be marked SK_BT_REQFWD, but because _bt_checkkeys stops testing as soon as any key has failed, it doesn't reach the second key and doesn't realize that there's a failing SK_BT_REQFWD key available. That's a little bit annoying, but the only clear fix would be to keep testing keys after we already know the index entry has failed, and I think that's probably going to be a net loss. If we fix the NULL case to accept either flag, then the only case where this will matter is redundant scan keys, and that's not a case that I think we should optimize at the cost of pessimizing normal cases. A patch along these lines should be pretty localized. Has anyone got an opinion on whether to back-patch or not? This seems like a performance bug, but given the lack of prior complaints, maybe we shouldn't take any risks for it. regards, tom lane
On Mon, Oct 31, 2011 at 12:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > A patch along these lines should be pretty localized. =A0Has anyone > got an opinion on whether to back-patch or not? =A0This seems like a > performance bug, but given the lack of prior complaints, maybe we > shouldn't take any risks for it. I think my vote would be to just fix it in master. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Oct 31, 2011 at 12:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A patch along these lines should be pretty localized. Has anyone >> got an opinion on whether to back-patch or not? This seems like a >> performance bug, but given the lack of prior complaints, maybe we >> shouldn't take any risks for it. > I think my vote would be to just fix it in master. After researching the issue, my patch looks like the attached. This seems to apply cleanly back to about 8.3, so I'm inclined to back-patch that far. It's pretty simple, and I think the reason we (I) got it wrong to begin with is just lack of thought about the NULLs case. regards, tom lane diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c index 134abe6d596413a1ad54c0c7f378248251295d4a..c00255567116e3b6f7bd8426eb32b2398c8d95d5 100644 *** a/src/backend/access/nbtree/nbtutils.c --- b/src/backend/access/nbtree/nbtutils.c *************** _bt_checkkeys(IndexScanDesc scan, *** 1384,1398 **** } /* ! * Tuple fails this qual. If it's a required qual for the current ! * scan direction, then we can conclude no further tuples will ! * pass, either. */ ! if ((key->sk_flags & SK_BT_REQFWD) && ! ScanDirectionIsForward(dir)) ! *continuescan = false; ! else if ((key->sk_flags & SK_BT_REQBKWD) && ! ScanDirectionIsBackward(dir)) *continuescan = false; /* --- 1394,1409 ---- } /* ! * Tuple fails this qual. If it's a required qual, then we can ! * conclude no further tuples will pass, either. We can stop ! * regardless of the scan direction, because we know that NULLs ! * sort to one end or the other of the range of values. If this ! * tuple doesn't pass, then no future ones will either, until we ! * reach the next set of values of the higher-order index attrs ! * (if any) ... and those attrs must have equality quals, else ! * this one wouldn't be marked required. */ ! if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) *continuescan = false; /* *************** _bt_checkkeys(IndexScanDesc scan, *** 1403,1434 **** if (isNull) { ! if (key->sk_flags & SK_BT_NULLS_FIRST) ! { ! /* ! * Since NULLs are sorted before non-NULLs, we know we have ! * reached the lower limit of the range of values for this ! * index attr. On a backward scan, we can stop if this qual ! * is one of the "must match" subset. On a forward scan, ! * however, we should keep going. ! */ ! if ((key->sk_flags & SK_BT_REQBKWD) && ! ScanDirectionIsBackward(dir)) ! *continuescan = false; ! } ! else ! { ! /* ! * Since NULLs are sorted after non-NULLs, we know we have ! * reached the upper limit of the range of values for this ! * index attr. On a forward scan, we can stop if this qual is ! * one of the "must match" subset. On a backward scan, ! * however, we should keep going. ! */ ! if ((key->sk_flags & SK_BT_REQFWD) && ! ScanDirectionIsForward(dir)) ! *continuescan = false; ! } /* * In any case, this indextuple doesn't match the qual. --- 1414,1428 ---- if (isNull) { ! /* ! * The index entry is NULL, so it must fail this qual (we assume ! * all btree operators are strict). Furthermore, we know that ! * all remaining entries with the same higher-order index attr ! * values must be NULLs too. So, just as above, we can stop the ! * scan regardless of direction, if the qual is required. ! */ ! if (key->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) ! *continuescan = false; /* * In any case, this indextuple doesn't match the qual. *************** _bt_check_rowcompare(ScanKey skey, Index *** 1508,1539 **** if (isNull) { ! if (subkey->sk_flags & SK_BT_NULLS_FIRST) ! { ! /* ! * Since NULLs are sorted before non-NULLs, we know we have ! * reached the lower limit of the range of values for this ! * index attr. On a backward scan, we can stop if this qual is ! * one of the "must match" subset. On a forward scan, ! * however, we should keep going. ! */ ! if ((subkey->sk_flags & SK_BT_REQBKWD) && ! ScanDirectionIsBackward(dir)) ! *continuescan = false; ! } ! else ! { ! /* ! * Since NULLs are sorted after non-NULLs, we know we have ! * reached the upper limit of the range of values for this ! * index attr. On a forward scan, we can stop if this qual is ! * one of the "must match" subset. On a backward scan, ! * however, we should keep going. ! */ ! if ((subkey->sk_flags & SK_BT_REQFWD) && ! ScanDirectionIsForward(dir)) ! *continuescan = false; ! } /* * In any case, this indextuple doesn't match the qual. --- 1502,1516 ---- if (isNull) { ! /* ! * The index entry is NULL, so it must fail this qual (we assume ! * all btree operators are strict). Furthermore, we know that ! * all remaining entries with the same higher-order index attr ! * values must be NULLs too. So, just as above, we can stop the ! * scan regardless of direction, if the qual is required. ! */ ! if (subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)) ! *continuescan = false; /* * In any case, this indextuple doesn't match the qual.