Обсуждение: Problem getting query to use index inside a function
I’m not sure if I should send this to pgsql-performance or pqsql-general so hopefully I’m sending to the correct one.
Table definition:
shared=> \d request
Table "public.request"
Column | Type | Collation | Nullable | Default
------------------+-----------------------------+-----------+----------+----------------
objectid | character(36) | | not null |
data | jsonb | | not null | '{}'::jsonb
clientid | character(3) | | not null |
active | integer | | not null |
productid | integer | | not null |
checkoutbyuid | character(100) | | |
checkoutdatetime | timestamp without time zone | | |
metadata | jsonb | | not null | '{}'::jsonb
search_vector | tsvector | | |
requeststate | text | | not null | 'Active'::text
Indexes:
"requestkey" PRIMARY KEY, btree (objectid, productid)
Here is the function I’m having difficulties with:
CREATE OR REPLACE FUNCTION public.steve1(param_requestid text[], param_productid integer DEFAULT 1)
RETURNS TABLE(objectid text, n text, v text, vt integer)
LANGUAGE sql
AS $function$
SELECT objectid::text
, i->>'n'::text
, i->>'v'::text
, (i->>'vt') :: INT as vt
FROM request r
, jsonb_array_elements(data -> 'i') i
WHERE objectid = ANY($1)
AND productid=$2
$function$
;
Query:
shared=> explain (analyze, buffers)
shared-> SELECT objectid::text, n::text, v::text, vt::int FROM steve1(ARRAY['5ab8e0ca-abb5-48c9-a95e-2bb2a375d903','adcbe251-6723-48a8-8385-55133fab704a'], 1);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Function Scan on steve1 (cost=0.25..10.25 rows=1000 width=100) (actual time=42.694..42.942 rows=3368 loops=1)
Buffers: shared hit=16612 ß-----------------
Planning Time: 0.034 ms
Execution Time: 43.279 ms
(4 rows)
The query is doing 16612 logical reads which implies it full scanning the table. I can also run the query providing the input values where you can clearly see it’s full scanning the table:
shared=> explain (analyze, buffers)
shared-> SELECT objectid::text
shared-> , i->>'n'::text
shared-> , i->>'v'::text
shared-> , (i->>'vt') :: INT as vt
shared-> FROM request r
shared-> , jsonb_array_elements(data -> 'i') i
shared-> WHERE objectid = ANY( ARRAY['5ab8e0ca-abb5-48c9-a95e-2bb2a375d903','adcbe251-6723-48a8-8385-55133fab704a'])
shared-> AND productid=1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.01..19561.37 rows=57000 width=100) (actual time=0.351..41.354 rows=3368 loops=1)
Buffers: shared hit=16586 ß-----------
-> Seq Scan on request r (cost=0.00..17566.36 rows=570 width=67) (actual time=0.010..38.341 rows=2 loops=1)
Filter: ((productid = 1) AND ((objectid)::text = ANY ('{5ab8e0ca-abb5-48c9-a95e-2bb2a375d903,adcbe251-6723-48a8-8385-55133fab704a}'::text[])))
Rows Removed by Filter: 57077
Buffers: shared hit=16569
-> Function Scan on jsonb_array_elements i (cost=0.01..1.00 rows=100 width=32) (actual time=0.291..0.406 rows=1684 loops=2)
Buffers: shared hit=17
Planning Time: 0.093 ms
Execution Time: 41.607 ms
(10 rows)
I can get the query to use the PK index if I change the query from ARRAY [] to {}:
shared=> explain (analyze, buffers)
shared-> SELECT objectid::text
shared-> , i->>'n'::text
shared-> , i->>'v'::text
shared-> , (i->>'vt') :: INT as vt
shared-> FROM request r
shared-> , jsonb_array_elements(data -> 'i') i
shared-> WHERE objectid = ANY('{5ab8e0ca-abb5-48c9-a95e-2bb2a375d903,adcbe251-6723-48a8-8385-55133fab704a}')
shared-> AND productid=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.42..23.87 rows=200 width=100) (actual time=0.346..3.007 rows=3368 loops=1)
Buffers: shared hit=27 ß--------------
-> Index Scan using requestkey on request r (cost=0.41..16.87 rows=2 width=67) (actual time=0.036..0.049 rows=2 loops=1)
Index Cond: ((objectid = ANY ('{5ab8e0ca-abb5-48c9-a95e-2bb2a375d903,adcbe251-6723-48a8-8385-55133fab704a}'::bpchar[])) AND (productid = 1))
Buffers: shared hit=10
-> Function Scan on jsonb_array_elements i (cost=0.01..1.00 rows=100 width=32) (actual time=0.274..0.388 rows=1684 loops=2)
Buffers: shared hit=17
Planning:
Buffers: shared hit=27
Planning Time: 0.176 ms
Execution Time: 3.286 ms
(11 rows)
But if I change the call to the function to use similar syntax the shared buffer hits of 16,586 is telling me it’s still full scanning the table:
shared=> explain (analyze, buffers)
shared-> SELECT objectid::text, n::text, v::text, vt::int FROM steve1('{5ab8e0ca-abb5-48c9-a95e-2bb2a375d903,adcbe251-6723-48a8-8385-55133fab704a}', 1);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Function Scan on steve1 (cost=0.25..10.25 rows=1000 width=100) (actual time=56.708..56.970 rows=3368 loops=1)
Buffers: shared hit=16586 ß-------------
Planning Time: 0.024 ms
Execution Time: 57.316 ms
(4 rows)
What do I need to change to get the query to use the PK index when executed inside the function?
Thanks in advance
Steve
"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes:
> Here is the function I'm having difficulties with:
> CREATE OR REPLACE FUNCTION public.steve1(param_requestid text[], param_productid integer DEFAULT 1)
> RETURNS TABLE(objectid text, n text, v text, vt integer)
> LANGUAGE sql
> AS $function$
> SELECT objectid::text
> , i->>'n'::text
> , i->>'v'::text
> , (i->>'vt') :: INT as vt
> FROM request r
> , jsonb_array_elements(data -> 'i') i
> WHERE objectid = ANY($1)
> AND productid=$2
> $function$
> ;
> Query:
> shared=> explain (analyze, buffers)
> shared-> SELECT objectid::text, n::text, v::text, vt::int FROM
steve1(ARRAY['5ab8e0ca-abb5-48c9-a95e-2bb2a375d903','adcbe251-6723-48a8-8385-55133fab704a'],1);
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
> Function Scan on steve1 (cost=0.25..10.25 rows=1000 width=100) (actual time=42.694..42.942 rows=3368 loops=1)
I think you would have better luck if the planner were "inlining"
this function, which we can see it's not since you get a Function Scan
on steve1 rather than the contained query.
I think the only thing stopping that from happening is that the
function is (by default) VOLATILE. Try marking it STABLE so that
it can share the calling query's snapshot.
(v18 should handle such cases better than previous versions, BTW.
But you'd still be better off marking the function STABLE.)
regards, tom lane
> > Here is the function I'm having difficulties with:
> > CREATE OR REPLACE FUNCTION public.steve1(param_requestid text[],
> > param_productid integer DEFAULT 1) RETURNS TABLE(objectid text, n
> > text, v text, vt integer) LANGUAGE sql AS $function$
> > SELECT objectid::text
> > , i->>'n'::text
> > , i->>'v'::text
> > , (i->>'vt') :: INT as vt
> > FROM request r
> > , jsonb_array_elements(data -> 'i') i WHERE objectid =
> > ANY($1)
> > AND productid=$2
> > $function$
> > ;
> > Query:
> > shared=> explain (analyze, buffers)
> > shared-> SELECT objectid::text, n::text, v::text, vt::int FROM
> > shared-> steve1(ARRAY['5ab8e0ca-abb5-48c9-a95e-2bb2a375d903','adcbe251
> > shared-> -6723-48a8-8385-55133fab704a'], 1);
> > QUERY PLAN
> > ----------------------------------------------------------------------
> > ------------------------------------------
> > Function Scan on steve1 (cost=0.25..10.25 rows=1000 width=100)
> > (actual time=42.694..42.942 rows=3368 loops=1)
> I think you would have better luck if the planner were "inlining"
> this function, which we can see it's not since you get a Function Scan on steve1 rather than the contained query.
> I think the only thing stopping that from happening is that the function is (by default) VOLATILE. Try marking it
STABLEso that it can share the calling query's snapshot.
> (v18 should handle such cases better than previous versions, BTW.
> But you'd still be better off marking the function STABLE.)
> regards, tom lane
Thanks for the reply, but that did not seem to help. I tried using both the {} and ARRAY[] syntax but both are still
fullscanning based on the shared buffers. Any other ideas?
shared=> ALTER FUNCTION public.steve1 (param_requestid text[], param_productid integer) STABLE;
ALTER FUNCTION
shared=> explain (analyze, buffers)
shared-> SELECT objectid::text, n::text, v::text, vt::int FROM steve1('{83e3326a-62fe-45bc-81e5-1e9fb9a84d31}',
'PJJ',1, 1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Function Scan on steve1 (cost=0.25..10.25 rows=1000 width=100) (actual time=64.465..64.465 rows=0 loops=1)
Buffers: shared hit=16572
Planning Time: 0.033 ms
Execution Time: 64.485 ms
(4 rows)
shared=>
shared=> explain (analyze, buffers)
shared-> SELECT objectid::text, n::text, v::text, vt::int FROM steve1(ARRAY['83e3326a-62fe-45bc-81e5-1e9fb9a84d31'],
'PJJ',1, 1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Function Scan on steve1 (cost=0.25..10.25 rows=1000 width=100) (actual time=63.749..63.749 rows=0 loops=1)
Buffers: shared hit=16569
Planning Time: 0.043 ms
Execution Time: 63.766 ms
(4 rows)
"Dirschel, Steve" <steve.dirschel@thomsonreuters.com> writes:
>> I think you would have better luck if the planner were "inlining"
>> this function, which we can see it's not since you get a Function Scan on steve1 rather than the contained query.
>> I think the only thing stopping that from happening is that the function is (by default) VOLATILE. Try marking it
STABLEso that it can share the calling query's snapshot.
> Thanks for the reply, but that did not seem to help.
I tried to replicate this as follows:
--- CUT ---
create table request(objectid text, productid int, data jsonb);
create index on request(objectid, productid);
CREATE OR REPLACE FUNCTION public.steve1(param_requestid text[],
param_productid integer DEFAULT 1) RETURNS TABLE(objectid text, n
text, v text, vt integer) LANGUAGE sql AS $function$
SELECT objectid::text
, i->>'n'::text
, i->>'v'::text
, (i->>'vt') :: INT as vt
FROM request r
, jsonb_array_elements(data -> 'i') i WHERE objectid =
ANY($1)
AND productid=$2
$function$
stable ;
explain
SELECT objectid::text, n::text, v::text, vt::int FROM
steve1(ARRAY['5ab8e0ca-abb5-48c9-a95e-2bb2a375d903','adcbe251-6723-48a8-8385-55133fab704a'], 1);
--- CUT ---
and I got:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.15..11.42 rows=100 width=100)
-> Index Scan using request_objectid_productid_idx on request r (cost=0.15..8.17 rows=1 width=64)
Index Cond: ((objectid = ANY
('{5ab8e0ca-abb5-48c9-a95e-2bb2a375d903,adcbe251-6723-48a8-8385-55133fab704a}'::text[]))AND (productid = 1))
-> Function Scan on jsonb_array_elements i (cost=0.01..1.00 rows=100 width=32)
(4 rows)
which is what I expected from successful inlining of the function.
So there are some moving parts in your situation that you've not
told us about.
regards, tom lane
>> Thanks for the reply, but that did not seem to help.
> I tried to replicate this as follows:
> --- CUT ---
> create table request(objectid text, productid int, data jsonb); create index on request(objectid, productid);
> CREATE OR REPLACE FUNCTION public.steve1(param_requestid text[], param_productid integer DEFAULT 1) RETURNS
TABLE(objectidtext, n text, v text, vt integer) LANGUAGE sql AS $function$
> SELECT objectid::text
> , i->>'n'::text
> , i->>'v'::text
> , (i->>'vt') :: INT as vt
> FROM request r
> , jsonb_array_elements(data -> 'i') i WHERE objectid =
> ANY($1)
> AND productid=$2
> $function$
> stable ;
> explain
> SELECT objectid::text, n::text, v::text, vt::int FROM
steve1(ARRAY['5ab8e0ca-abb5-48c9-a95e-2bb2a375d903','adcbe251-6723-48a8-8385-55133fab704a'],1);
> --- CUT ---
> and I got:
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.15..11.42 rows=100 width=100)
> -> Index Scan using request_objectid_productid_idx on request r (cost=0.15..8.17 rows=1 width=64)
> Index Cond: ((objectid = ANY
('{5ab8e0ca-abb5-48c9-a95e-2bb2a375d903,adcbe251-6723-48a8-8385-55133fab704a}'::text[]))AND (productid = 1))
> -> Function Scan on jsonb_array_elements i (cost=0.01..1.00 rows=100 width=32)
> (4 rows)
> which is what I expected from successful inlining of the function.
> So there are some moving parts in your situation that you've not told us about.
> regards, tom lane
Hi Tom,
I ran your code and got similar results so I agree there is more for me to dig into. I see STABLE has a limit of not
allowingDML in the function. I am testing against a very simple function here but we have many other functions with
thesame problem but those also contain DML so even if I got the STABLE to work in this one test case it does not appear
Icould use in all of my functions with this problem. Are there other options here besides the STABLE option that would
workfor functions that also contain DML?
Thanks
Steve