Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans
От | Jacob Champion |
---|---|
Тема | Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans |
Дата | |
Msg-id | CAAWbhmjmjqgmaHHcu5rmx3sgdXBJT_t5bt1zW8rjMyeis=5mtg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans
|
Список | pgsql-hackers |
On Mon, Feb 27, 2023 at 12:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 22/02/2023 15:03, Aleksander Alekseev wrote: > > If memory serves I noticed that WHERE ... IS NULL queries don't even > > hit HeapKeyTest() and I was curious where the check for NULLs is > > actually made. As I understand, SeqNext() in nodeSeqscan.c simply > > iterates over all the tuples it can find and pushes them to the parent > > node. We could get a slightly better performance for certain queries > > if SeqNext() did the check internally. > > Right, it might be faster to perform the NULL-checks before checking > visibility, for example. Arbitrary quals cannot be evaluated before > checking visibility, but NULL checks could be. Hi Heikki, There's quite a bit of work left to do, but I wanted to check if the attached patch (0002, based on top of Aleks' 0001 from upthread) was going in the direction you were thinking. This patch pushes down any forced-null and not-null Vars as ScanKeys. It doesn't remove the redundant quals after turning them into ScanKeys, so it's needlessly inefficient, but there's still a decent speedup for some of the basic benchmarks in 0003. Plans look something like this: # EXPLAIN SELECT * FROM t WHERE i IS NULL; QUERY PLAN ------------------------------------------------------------ Seq Scan on t (cost=0.00..1393.00 rows=49530 width=4) Scan Cond: (i IS NULL) Filter: (i IS NULL) (3 rows) # EXPLAIN SELECT * FROM t WHERE i = 3; QUERY PLAN -------------------------------------------------------- Seq Scan on t (cost=0.00..1643.00 rows=1 width=4) Scan Cond: (i IS NOT NULL) Filter: (i = 3) (3 rows) The non-nullable case worries me a bit because so many things imply IS NOT NULL. I think I need to do some sort of cost analysis using the null_frac statistics -- it probably only makes sense to push an implicit SK_SEARCHNOTNULL down to the AM layer if some fraction of rows would actually be filtered out -- but I'm not really sure how to choose a threshold. It would also be neat if `COUNT(col)` could push down SK_SEARCHNOTNULL, but I think that would require a new support function to rewrite the plan for an aggregate. Am I on the right track? Thanks, --Jacob
Вложения
В списке pgsql-hackers по дате отправления: