Re: Unexplainable execution time difference between two testfunctions...one using IF (SELECT COUNT(*) FROM...) and the other using IFEXISTS (SELECT 1 FROM...)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Unexplainable execution time difference between two testfunctions...one using IF (SELECT COUNT(*) FROM...) and the other using IFEXISTS (SELECT 1 FROM...)
Дата
Msg-id e189a2ea-8612-339b-8de5-1ed567118670@2ndquadrant.com
обсуждение исходный текст
Ответ на Unexplainable execution time difference between two testfunctions...one using IF (SELECT COUNT(*) FROM...) and the other using IFEXISTS (SELECT 1 FROM...)  (Hackety Man <hacketyman@gmail.com>)
Ответы Re: Unexplainable execution time difference between two testfunctions...one using IF (SELECT COUNT(*) FROM...) and the other using IFEXISTS (SELECT 1 FROM...)  (Hackety Man <hacketyman@gmail.com>)
Список pgsql-performance

On 04/16/2018 10:42 PM, Hackety Man wrote:
> ...
>     The first thing I did was to run some baseline tests using the basic
>     queries inside of the IF() checks found in each of the functions to
>     see how the query planner handled them.  I ran the following two
>     queries.
> 
>         EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE
>         LOWER(text_distinct) = LOWER('Test5000001');
>         EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(text_distinct) = LOWER('Test5000001');
> 

Those are not the interesting plans, though. The EXISTS only cares about 
the first row, so you should be looking at

     EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
     LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;

>     I moved on to test the other function with the following query...
> 
>         EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
>         zz_spx_ifcount_noidx('Test5000001');
> 
>     and I got the following "auto_explain" results...
> 
>         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms 
>         plan:
>           Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>           Result  (cost=4.08..4.09 rows=1 width=1) (actual
>         time=426.274..426.274 rows=1 loops=1)
>             Buffers: shared read=5406
>             InitPlan 1 (returns $0)
>            ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000
>         width=0) (actual time=426.273..426.273 rows=0 loops=1)
>               Filter: (lower(text_distinct) = 'test5000001'::text)
>               Rows Removed by Filter: 1000000
>               Buffers: shared read=5406
>         2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
>         "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>           PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>         2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms 
>         plan:
>           Query Text: explain (analyze, buffers) select * from
>         zz_spx_ifexists_noidx('Test5000001')
>           Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>         rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)
>             Buffers: shared hit=30 read=5438
> 
>     Definitely not the execution time, or query plan, results I was
>     expecting.  As we can see, no workers were employed here and my
>     guess was that this was the reason or the large execution time
>     difference between these 2 tests?  199 milliseconds versus 428
>     milliseconds, which is a big difference.  Why are workers not being
>     employed here like they were when I tested the query found inside of
>     the IF() check in a standalone manner?  But then I ran another test
>     and the results made even less sense to me.
> 

The plan difference is due to not realizing the EXISTS essentially 
implies LIMIT 1. Secondly, it expects about 5000 rows matching the 
condition,  uniformly spread through the table. But it apparently takes 
much longer to find the first one, hence the increased duration.

How did you generate the data?

>     When I ran the above query the first 5 times after starting my
>     Postgres service, I got the same results each time (around 428
>     milliseconds), but when running the query 6 or more times, the
>     execution time jumps up to almost double that.  Here are the
>     "auto_explain" results running this query a 6th time...
> 

This is likely due to generating a generic plan after the fifth 
execution. There seems to be only small difference in costs, though.

>         --"auto_explain" results after running the same query 6 or more
>         times.
>         2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms 
>         plan:
>           Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
>           Result  (cost=4.58..4.59 rows=1 width=1) (actual
>         time=761.843..761.843 rows=1 loops=1)
>             Buffers: shared hit=160 read=5246
>             InitPlan 1 (returns $0)
>            ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000
>         width=0) (actual time=761.841..761.841 rows=0 loops=1)
>               Filter: (lower(text_distinct) = lower($1))
>               Rows Removed by Filter: 1000000
>               Buffers: shared hit=160 read=5246
>         2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
>         "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
>         LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
>           PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF
>         2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms 
>         plan:
>           Query Text: explain (analyze, buffers) select * from
>         zz_spx_ifexists_noidx('Test5000001')
>           Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
>         rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)
>             Buffers: shared hit=160 read=5246
> 
>     As you can see, the execution time jumps up to about 762
>     milliseonds.  I can see in the sequence scan node that the LOWER()
>     function shows up on the right side of the equal operator, whereas
>     in the first 5 runs of this test query the plan did not show this. 
>     Why is this?
> 

It doesn't really matter on which side it shows, it's more about a 
generic plan built without knowledge of the parameter value.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-performance по дате отправления:

Предыдущее
От: Dinesh Chandra 12108
Дата:
Сообщение: Installing PostgreSQL 9.5 in centos 6 using YUM
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Unexplainable execution time difference between two testfunctions...one using IF (SELECT COUNT(*) FROM...) and the other using IFEXISTS (SELECT 1 FROM...)