Обсуждение: Why search term results different query plan?
Hi,
Could anybody give me hint, or reccomend a source about this:
Please note that two queries are different only in search terms (STAR and
A). Plan and performane difference between them confused me.
Thanks,
Erol
trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p,
product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE
'%STAR%') order by p.name;
NOTICE: QUERY PLAN:
Unique (cost=1599.50..1599.58 rows=1 width=98)
-> Sort (cost=1599.50..1599.50 rows=2 width=98)
-> Nested Loop (cost=0.00..1599.48 rows=2 width=98)
-> Seq Scan on product_t p (cost=0.00..613.41 rows=1
width=94)
-> Seq Scan on product_detail_t pd (cost=0.00..983.19
rows=231 width=4)
[postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM
product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%STAR%' OR p.description LIKE '%STAR%' OR p.basesku LIKE
'%STAR%') order by p.name;" -d trollandtoad2
real 4m24.500s
user 0m0.020s
sys 0m0.010s
------------------------------------------
EXPLAIN
trollandtoad2=# explain SELECT DISTINCT p.* FROM product_t p,
product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%')
order by p.name;
NOTICE: QUERY PLAN:
Unique (cost=4456.37..5127.88 rows=1919 width=98)
-> Sort (cost=4456.37..4456.37 rows=19186 width=98)
-> Merge Join (cost=2389.21..2496.11 rows=19186 width=98)
-> Sort (cost=1396.97..1396.97 rows=8321 width=94)
-> Seq Scan on product_t p (cost=0.00..613.41
rows=8321 width=94)
-> Sort (cost=992.24..992.24 rows=231 width=4)
-> Seq Scan on product_detail_t pd (cost=0.00..983.19
rows=231 width=4)
EXPLAIN
[postgres@trollandtoad postgres]$ time psql -c "SELECT DISTINCT p.* FROM
product_t p, product_detail_t pd WHERE ((pd.productid=p.productid) AND
(pd.productconditionid<>'PL' or pd.quantityonsite>0) AND pd.active=1) and
(p.Name LIKE '%A%' OR p.description LIKE '%A%' OR p.basesku LIKE '%A%')
order by p.name;" -d trollandtoad2
real 0m6.284s
user 0m0.270s
sys 0m0.030s
On Sun, Sep 30, 2001 at 04:18:53AM +0300, Erol ?z wrote: > Hi, > Could anybody give me hint, or reccomend a source about this: > Please note that two queries are different only in search terms (STAR and > A). Plan and performane difference between them confused me. > Thanks, > Erol %STAR%: > Unique (cost=1599.50..1599.58 rows=1 width=98) > -> Sort (cost=1599.50..1599.50 rows=2 width=98) > -> Nested Loop (cost=0.00..1599.48 rows=2 width=98) > -> Seq Scan on product_t p (cost=0.00..613.41 rows=1 width=94) > -> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4) %A%: > Unique (cost=4456.37..5127.88 rows=1919 width=98) > -> Sort (cost=4456.37..4456.37 rows=19186 width=98) > -> Merge Join (cost=2389.21..2496.11 rows=19186 width=98) > -> Sort (cost=1396.97..1396.97 rows=8321 width=94) > -> Seq Scan on product_t p (cost=0.00..613.41 rows=8321 width=94) > -> Sort (cost=992.24..992.24 rows=231 width=4) > -> Seq Scan on product_detail_t pd (cost=0.00..983.19 rows=231 width=4) Note how in the first one it assumes that there is only one match in p whereas in the second there are 8321. Are either of those correct? I don't really see how you can do any real estimates on %X% type queries, although maybe it's assuming longer string => less matches. Not sure how to fix it though... HTH, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
"=?iso-8859-9?B?RXJvbCDWeg==?=" <eroloz@esg.com.tr> writes:
> Could anybody give me hint, or reccomend a source about this:
> Please note that two queries are different only in search terms (STAR and
> A). Plan and performane difference between them confused me.
Note the difference in the estimated number of rows matched in
product_t:
> -> Seq Scan on product_t p (cost=0.00..613.41 rows=1 width=94)
> -> Seq Scan on product_t p (cost=0.00..613.41 rows=8321 width=94)
The planner thinks --- not unreasonably, IMHO --- that LIKE '%A%' will
match many more rows than LIKE '%STAR%'. Accordingly, it prepares
different plans for the two cases.
Since you're complaining, I suppose that LIKE '%STAR%' matches many more
rows than random chance would suggest, and so that plan turns out to be
poorly chosen.
regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes:
> I don't really see how you can do any real estimates on %X% type queries,
> although maybe it's assuming longer string => less matches.
More exactly, it's assuming more fixed characters in the pattern ->
less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c.
While the specific numbers it's using are made from whole cloth, I think
the principle should hold good.
I don't see any way to accumulate actual statistics that would improve
the estimate, do you?
regards, tom lane
On Sat, Sep 29, 2001 at 10:37:24PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > I don't really see how you can do any real estimates on %X% type queries, > > although maybe it's assuming longer string => less matches. > > More exactly, it's assuming more fixed characters in the pattern -> > less matches; see like_selectivity in src/backend/utils/adt/selfuncs.c. > While the specific numbers it's using are made from whole cloth, I think > the principle should hold good. > > I don't see any way to accumulate actual statistics that would improve > the estimate, do you? It is difficult, although obviously the results he is getting are bunk. The only problem I can see is that it doesn't appear to be using any of the gathered statistics at all. For example, if %STAR% matched the most common value in the column, the selectivity would still be very low. What I find most interesting about his case is that the query with %A% ran 40 times faster (260s to 6s) than the one where it thought only 1 row in p was going to match. What that tells me is that the plan used for %A% is more appropriate even though %STAR% is more selective. How can you convince the planner of that? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.