Re: osdl-dbt3 run results - puzzled by the execution
От | Jenny Zhang |
---|---|
Тема | Re: osdl-dbt3 run results - puzzled by the execution |
Дата | |
Msg-id | 1064007341.442.45.camel@ibm-a.pdx.osdl.net обсуждение исходный текст |
Ответ на | Re: osdl-dbt3 run results - puzzled by the execution plans (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
On Thu, 2003-09-18 at 20:20, Tom Lane wrote: > Jenny Zhang <jenny@osdl.org> writes: > > ... It seems to me that small > > effective_cache_size favors the choice of nested loop joins (NLJ) > > while the big effective_cache_size is in favor of merge joins (MJ). > > No, I wouldn't think that, because a nestloop plan will involve repeated > fetches of the same tuples whereas a merge join doesn't (at least not > when it sorts its inner input, as this plan does). Larger cache > improves the odds of a repeated fetch not having to do I/O. In practice > a larger cache area would also have some effects on access costs for the > sort's temp file, but I don't think the planner's cost model for sorting > takes that into account. I think there is some misunderstanding here. What I meant to say is: From the plans we got, the optimizer favors the choice of nested loop joins (NLJ) while the big effective_cache_size is in favor of merge joins (MJ). Which we think is not appropriate. We verified that sort_mem has no impact on the plans. Though it would be nice to take that into account. > > As Matt Clark points out nearby, the real question is whether these > planner estimates have anything to do with reality. EXPLAIN ANALYZE > results would be far more interesting than plain EXPLAIN. > > > However, within the same run set consist of 6 runs, we see 2-3% > > standard deviation for the run metrics associated with the multiple > > stream part of the test (as opposed to the single stream part). > > <python> Och, laddie, we useta *dream* of 2-3% variation </python> > BTW, I am a she :-) > > We would like to reduce the variation to be less than 1% so that a > > 2% change between two different kernels would be significant. > > I think this is a pipe dream. Variation in where the data gets laid > down on your disk drive would alone create more than that kind of delta. > I'm frankly amazed you could get repeatability within 2-3%. > Greg is right. The repeatability is due to the aggregate results for a whole test run. As for individual query, the power test(single stream) is very consistent, and the throughput test(multiple streams), any given query execution time varies up to 15% if no swapping. If we set sort_mem too high and swapping occurs, the variation is bigger. Jenny
В списке pgsql-performance по дате отправления: