Re: Force a merge join?
От | Doug Fields |
---|---|
Тема | Re: Force a merge join? |
Дата | |
Msg-id | 5.1.0.14.2.20020518193752.01f3dc00@mail.pexicom.com обсуждение исходный текст |
Ответ на | Re: Force a merge join? (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Force a merge join?
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-general |
> > 308 > >As opposed to the pg_stats estimate of 189 ... not too bad, really. >Is the most-common-values distribution shown in the pg_stats output >reasonably correct? (Specifically, 38 has nearly 40% of the entries, >192 another 12%, and everything else 1.5% or less) The output is actually pretty in terms of which should be in the top 10, but not which exactly: pexitest=# select list_id, count(*) from test_list_entries group by list_id order by count(*) DESC; list_id | count ---------+-------- 192 | 173290 330 | 16174 501 | 14054 369 | 12659 229 | 12654 332 | 11429 342 | 10982 454 | 10404 493 | 9835 424 | 9778 460 | 9707 38 | 9454 331 | 9355 323 | 9232 319 | 9164 So that correlates pretty well with it's guesses as to the top 10: {38,192,369,330,332,501,229,493,319,424} But not so well as to their relative distributions: {0.389667,0.123667,0.0156667,0.013,0.00933333,0.00933333,0.009,0.00866667,0.00833333,0.00833333} Some other stats: select avg(count) from (select list_id, count(*) from test_list_entries group by list_id order by count(*) DESC) as a; avg ----------------- 2599.2727272727 (1 row) pexitest=# select stddev(count) from (select list_id, count(*) from test_list_entries group by list_id order by count(*) DESC) as a; stddev ------------------ 10160.4314402693 (1 row) >Another question is exactly what version you are running. I tried >plugging the stats values you gave into pg_statistic by hand, and >got this plan from current sources: I'm running 7.2.1 as packaged and distributed in the Debian/woody 7.2.1-2 distribution. >ie, it's estimating about 1800 matches per list_id value, which seems >pretty reasonable given that it knows none of these values are in the >most_common list. Now I don't see anything in the CVS logs to suggest >that the estimation of this query would've changed since late in 7.2 >beta cycle, so I'm confused why you don't get similar results. It would be nice if I could get it to use a query such as the ones I gave above to put exact values into the analysis. Or, if I could tell it to do a more detailed sampling during ANALYZE. I could also tell it to keep more than the top 10 in the statistics table (SET STATISTICS), but I'm not sure what it would buy me, other than forcing a larger sample (but not knowing how much larger). How much would I slow the ANALYZE statement, and more importantly, the query optimizer, if I told it to keep statistics on the top 200 instead of the default 10 values? In the mean time, I've surrounded my code with SET ENABLE_NESTLOOP=OFF and ON blocks, which should force merge joins. I appreciate your insight on this matter and others Tom. Thanks! Cheers, Doug
В списке pgsql-general по дате отправления: