Proposed cleanup of index-related planner estimation procedures
От | Tom Lane |
---|---|
Тема | Proposed cleanup of index-related planner estimation procedures |
Дата | |
Msg-id | 20065.947132708@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
RE: [HACKERS] Proposed cleanup of index-related planner estimation procedures
|
Список | pgsql-hackers |
I am thinking about redefining and simplifying the planner's interface to index-type-dependent estimation routines. Currently, each defined index type is supposed to supply two routines: an "amopselect" routine and an "amopnpages" routine. (The existing actual routines of this kind are btreesel, btreenpages, etc in src/backend/utils/adt/selfuncs.c.) These things are called by index_selectivity() in src/backend/optimizer/util/plancat.c. amopselect tries to determine the selectivity of an indexqual (the fraction of main-table tuples it will select) and amopnpages tries to determine the number of index pages that will be read to do it. Now, this collection of code is largely redundant with optimizer/path/clausesel.c, which also tries to estimate the selectivity of qualification conditions. Furthermore, the interface to these routines is fundamentally misdesigned, because there is no way to deal with interrelated qualification conditions --- for example, if we have a range query like "... WHERE x > 10 AND x < 20", the code estimates the selectivity as the product of the selectivities of the two terms independently, but the actual selectivity is very different from that. I am working on fixing clausesel.c to be smarter about correlated conditions, but it won't do much good to fix that code without fixing the index-related code. What I'm thinking about doing is replacing these two per-index-type routines with a single routine, which is called once per proposed indexscan rather than once per qual clause. It would receive the whole indexqual list as a parameter, instead of just one qual. A typical implementation would just call clausesel.c's general-purpose code to estimate the selectivity, and then do a little bit of extra work to derive the estimated number of index pages from that number. I suppose the original reason for having amopselect at all was to allow exploitation of index-specific knowledge during selectivity estimation --- but none of the existing index types actually provide any such knowledge in their amopselect routines. Still, this redesign preserves the flexibility for an index type to do something specialized. A possible objection to this scheme is that the inputs and outputs of these routines would be structs that aren't full-fledged SQL types (and no, I'm not willing to promote parser expression trees into an SQL type ;-)). But I don't think that's a real problem. No one is going to be inventing new index types without doing a lot of C coding, so having to write the amopselect routines in C doesn't seem like a big drawback. Comments, objections, better ideas? regards, tom lane
В списке pgsql-hackers по дате отправления: