Re: estimates for nested loop very wrong?
От | Antti Haapala |
---|---|
Тема | Re: estimates for nested loop very wrong? |
Дата | |
Msg-id | Pine.GSO.4.44.0304102335170.10376-100000@paju.oulu.fi обсуждение исходный текст |
Ответ на | Re: estimates for nested loop very wrong? (joostje@komputilo.org) |
Список | pgsql-sql |
On Thu, 10 Apr 2003 joostje@komputilo.org wrote: > Je 2003/04/10(4)/12:04, Tom Lane skribis: > > joostje@komputilo.org writes: > > > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as > > > count(distinct(id)), but it obviously isn't. Also the most_common_freqs > > > values are about a 100 times higher than in reality, and, even tough about > > > 900 values of id occur more often than 40 times, in the 'most_common_vals' > > > list are 7 (of the 10) vals that occur less than 40 times, and the real > > > top two isn't even represented. > > > > Please try increasing the statistics target (see ALTER TABLE) for db.id, then > > re-analyze and see if the estimates get better. The default setting is > > 10 --- try 20, 50, 100 to see what happens. > > Well, the n_distinct estimates get better, but the cost estimates still > don't quite add up: `actual cost' is 23.24, cost estimate never gets > below 49930. > > stat.targ n_distinct| correlation cost estimate > 5 1917 | 0.43189 3621794.92 > 10 1998 | 0.3909 3618363.33 > 20 4330 | -0.247617 1981594.38 > 50 9708 | 0.0762642 975847.15 > 100 14604 | 0.030706 657631.41 > 200 21855 | 0.0446929 204335.70 > 500 39980 | -0.0497829 121000.31 > 1000 29468 | 0.0366528 49930.08 > 1000 29453 | 0.0367673 49954.08 > Table 1: various estimates as a function of statistical target > actual distinct values: 42226 > actual cost: varies from 5.0 to 27.8 > > So, the planner still prefers the mergejoin and hashjoin plans, causing > the select to take tens of seconds (60 for the mergejoin, I beleve), wheras > the Nested Loop takes only 0.024 seconds: > > For example, for the stat.targ=500 run: > => explain analyse SELECT id from db, tmp0 WHERE valida AND poseda='uea' AND tab='pers' AND tmp0.v0=id ; > NOTICE: QUERY PLAN: > > Nested Loop (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1) > -> Seq Scan on tmp0 (cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1) > -> Index Scan using db_id_idx on db (cost=0.00..120.63 rows=28 width=7) (actual time=0.27..0.75 rows=14 loops=29) > Total runtime: 23.92 msec > > In the above example, tmp0 had 29 values, that correspond to 415 rows in table db. > Table db has 586157 rows. Just a thought: is tmp0 analyzed? (1000 rows vs 29). :) Wouldn't this divide cost estimate by at least 30? (ok... it's still high...). -- Antti Haapala
В списке pgsql-sql по дате отправления: