Re: [GENERAL] Problem (bug?) with like
От | Tom Lane |
---|---|
Тема | Re: [GENERAL] Problem (bug?) with like |
Дата | |
Msg-id | 20011204171627.GB450@lorien.net обсуждение исходный текст |
Ответы |
Re: [GENERAL] Problem (bug?) with like
Re: [GENERAL] Problem (bug?) with like |
Список | pgsql-hackers |
bombadil@wanadoo.es writes: > 1) # explain SELECT * from v_cliente where nombre like '%DA%'; > Merge Join (cost=54763.50..62874.36 rows=413980 width=183) > -> Sort (cost=16238.44..16238.44 rows=54 width=131) > -> Seq Scan on cliente c (cost=0.00..16236.88 rows=54 width=131) > -> Sort (cost=38525.06..38525.06 rows=20097 width=74) > -> Subquery Scan d (cost=891.91..37088.66 rows=20097 width=74) > -> Hash Join (cost=891.91..37088.66 rows=20097 width=74) > ... > 2) explain SELECT * from v_cliente where nombre like '%DAVID%'; > Nested Loop (cost=891.91..61414.58 rows=7638 width=183) > -> Seq Scan on cliente c (cost=0.00..16236.88 rows=1 width=131) > -> Subquery Scan d (cost=891.91..37088.66 rows=20097 width=74) > -> Hash Join (cost=891.91..37088.66 rows=20097 width=74) > ... [same subplan as above] The problem here is that the planner is being way too optimistic about the selectivity of LIKE '%DAVID%' --- notice the estimate that only one matching row will be found in cliente, rather than 54 as with '%DA%'. So it chooses a plan that avoids the sort overhead needed for an efficient merge join with the other tables. That would be a win if there were only one matching row, but as soon as there are lots, it's a big loss, because the subquery to join the other tables gets redone for every matching row :-( >> Also, how many rows are there really that match '%DA%' and '%DAVID%'? > 1) 2672 rows -> 3.59 sec. > 2) 257 rows -> 364.69 sec. I am thinking that the rules for selectivity of LIKE patterns probably need to be modified. Presently the code assumes that a long constant string has probability of occurrence proportional to the product of the probabilities of the individual letters. That might be true in a random world, but people don't search for random strings. I think we need to back off the selectivity estimate by some large factor to account for the fact that the pattern being searched for is probably not random. Anyone have ideas how to do that? regards, tom lane
В списке pgsql-hackers по дате отправления: