Re: Removing unneeded self joins
От | Robert Haas |
---|---|
Тема | Re: Removing unneeded self joins |
Дата | |
Msg-id | CA+TgmoZ_HbxXopVvwtrne7kh-hz+Smp1LwjrFzBrq_cGPXL_NA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Removing unneeded self joins (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
On Wed, May 16, 2018 at 6:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> On 17 May 2018 at 08:44, Simon Riggs <simon@2ndquadrant.com> wrote: >>> What I was advocating was an approach that varies according to the >>> query cost, so we don't waste time trying to tune the heck out of OLTP >>> queries, but for larger queries we might take a more considered >>> approach. > >> That's tricky. If we do this, it should be done before Path >> generation, so not much is known about the costs in those case. > > Yeah. It'd have to be a very heuristic thing that doesn't account > for much beyond the number of relations in the query, and maybe their > sizes --- although I don't think we even know the latter at the > point where join removal would be desirable. (And note that one of > the desirable benefits of join removal is not having to find out the > sizes of removed rels ... so just swapping that around doesn't appeal.) As I've mentioned before, the problem we're talking about here is also highly relevant to parallel query. We only want to bother generating partial paths for queries that are expensive enough to justify considering parallelism, but we don't know how expensive they are until we finishing planning. The only way I could think of to tackle that problem was to drive it off the relation sizes, but the presence or absence of expensive functions in the target list can wildly change the point at which parallelism potentially becomes useful. So we end up sometimes wasting effort generating partial paths that are totally useless, and at other times failing to generate partial paths that would have been useful. (Really, I'd like to generate a lot more partial paths than we do, trying out various numbers of workers, but that would just make the existing problem worse.) I have wondered about doing a preliminary pass over the tree where we try to make a crude estimate of the amount of effort involved, and then planning for real with that number in hand. But it seems like there's so little information you could get at that early stage that it would be hard to decide anything useful on that basis. You've got to at least have relation sizes, and really you need some estimate of the result cardinality as well. It seems like we currently can't figure out cardinality without also computing paths, and I've wondered if we could split those into two separate phases. But every time I think about trying to do that I realize that my pain tolerance isn't that high. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: