Re: POC: GROUP BY optimization
От | Dmitry Dolgov |
---|---|
Тема | Re: POC: GROUP BY optimization |
Дата | |
Msg-id | CA+q6zcVRrd-z4YZ4M43ccst7aGL9==w5r1fionRWhP9ot6mybQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: POC: GROUP BY optimization
|
Список | pgsql-hackers |
> On Tue, Apr 9, 2019 at 5:21 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > So I personally would suggest to treat those patches as independent until > the very last moment, develop the costing improvements needed by each > of them, and then decide which of them are committable / in what order. I had the same idea, but judging from the questions, raised in this thread, it's quite hard to go with reordering based only on frequency of values. I hoped that the cost_sort improvement patch would be simple enough to incorporate it here, but of course it wasn't. Having an assumption, that the amount of work, required for performing sorting, depends only on the number of distinct groups and how costly it is to compare a values of this data type, I've ended up extracting get_width_multiplier and get_func_cost parts from cost_sort patch and including them into 0003-Reorder-by-values-distribution. This allows to take into account situations when we compare e.g. long strings or a custom data type with high procost for comparison (but I've used this values directly without any adjusting coefficients yet). > On Wed, Jun 13, 2018 at 6:41 PM Teodor Sigaev <teodor@sigaev.ru> wrote: > > > So that's a nice improvement, although I think we should also consider > > non-uniform distributions (using the per-column MCV lists). > > Could you clarify how to do that? Since I'm not familiar with this topic, I would like to ask the same question, how to do that and what are the advantages? > On Sat, Jun 16, 2018 at 5:59 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > I still think we need to be careful when introducing new optimizations > in this area - reordering the grouping keys by ndistinct, ORDER BY or > whatever. In particular I don't think we should commit these patches > that may quite easily cause regressions, and then hope some hypothetical > future patch fixes the costing. I'm a bit concerned about this part of the discussion. There is an idea through the whole thread about avoiding the situation, when a user knows which order is better and we generate different one by mistake. From what I see right now even if all the objections would be addressed, there is a chance that some coefficients will be not good enough (e.g. width multiplier is based on an average width, or it can suddenly happen that all the compared string have some difference at the very beginning) and the chosen order will be not optimal. Does it mean that in any case the implementation of such optimization should provide a way to override it?
Вложения
В списке pgsql-hackers по дате отправления: