Re: Suggestions wanted for 7.2.4 query
От | Josh Berkus |
---|---|
Тема | Re: Suggestions wanted for 7.2.4 query |
Дата | |
Msg-id | 200305041059.41468.josh@agliodbs.com обсуждение исходный текст |
Ответ на | Re: Suggestions wanted for 7.2.4 query (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Suggestions wanted for 7.2.4 query
|
Список | pgsql-performance |
Tom, > I have to take that back (must have been out too late last night ;-)). > The EXISTS subquery *is* getting pushed down to become a restriction on > events alone; that's what the "SubPlan" is. However, it'd still be > worth looking for another way to express it, because the planner is > pretty clueless about the selectivity of EXISTS restrictions. That's > what's causing it to drastically overestimate the number of rows taken > from "events" (14812 vs 1919), which in turn drives it away from using > the nestloop-with-inner-indexscan join style for joining to "cases". That may be solvable without forcing a seq scan on "events", simply by overdetermining the criteria on date. That is, I can't apply the date criteria to "events" because that would require running date calucations on each row forcing a seq scan ( i.e. (event_date + duration) between date_one and date_two would require a seq scan), but I can apply a broadend version of the criteria to "events" ( i.e. event_date between (date_one - 1 month) and (date_two + 1 day)) which would give the planner the idea that it is returning a minority of rows from "events". Someday, we have to come up with a way of indexing simple multi-column calculations. Unless someone did that in current source while I was behind on -hackers? > Right. The nestloop/indexscan style only wins if there are not too many > outer rows. If the EXISTS constraint actually did succeed for 14812 > "events" rows, the planner would probably be making the right choice to > use a hash join. Hmm. Any hope of improving this in the future? Like the IN() functionality improvements in 7.4? > BTW, have you tried lowering the value of "random_page_cost"? Looking > at the relative costs in these examples makes me think most of your > tables are cached in memory. Of course, if that's not true during > day-to-day production then you need to be wary about reducing the setting. No, we're probably cached ... the machine has 1gb of RAM. Also it has a really fast RAID array, at least for block disk reads, although random seek times suck. I can tweak a little. The problem is that it's a production machine in use 70 hours a week, so there isn't a lot of time we can test performance settings that might cause problems. Thanks for the advice! -- Josh Berkus Aglio Database Solutions San Francisco
В списке pgsql-performance по дате отправления: