New planner for like was -- Problem (bug?) with like
От | David Walter |
---|---|
Тема | New planner for like was -- Problem (bug?) with like |
Дата | |
Msg-id | Pine.SOL.4.10.10112041105540.6144-100000@boreas.ecs.syr.edu обсуждение исходный текст |
Ответ на | Re: Problem (bug?) with like (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
On Tue, 4 Dec 2001, Tom Lane wrote: > bombadil@wanadoo.es writes: > > 1) # explain SELECT * from v_cliente where nombre like '%DA%'; > > > Merge Join (cost=54763.50..62874.36 rows=413980 width=183) > > -> Sort (cost=16238.44..16238.44 rows=54 width=131) > The problem here is that the planner is being way too optimistic about > the selectivity of LIKE '%DAVID%' --- notice the estimate that only > one matching row will be found in cliente, rather than 54 as with '%DA%'. > So it chooses a plan that avoids the sort overhead needed for an > efficient merge join with the other tables. That would be a win if > there were only one matching row, but as soon as there are lots, it's > a big loss, because the subquery to join the other tables gets redone > for every matching row :-( > > >> Also, how many rows are there really that match '%DA%' and '%DAVID%'? > > > 1) 2672 rows -> 3.59 sec. > > 2) 257 rows -> 364.69 sec. > > I am thinking that the rules for selectivity of LIKE patterns probably > need to be modified. Presently the code assumes that a long constant > string has probability of occurrence proportional to the product of the > probabilities of the individual letters. That might be true in a random > world, but people don't search for random strings. I think we need to > back off the selectivity estimate by some large factor to account for > the fact that the pattern being searched for is probably not random. > Anyone have ideas how to do that? > Is there any statistic being kept by partial index? for instance #occurrences of A B A fairly small table/index could track these couldn't it? If it were a btree itself, then statistics could be split appropriately into sub-branches when the # of occurrences exceeds some threshold. david
В списке pgsql-general по дате отправления: