Re: queries with DISTINCT / GROUP BY giving different plans
От | Tom Lane |
---|---|
Тема | Re: queries with DISTINCT / GROUP BY giving different plans |
Дата | |
Msg-id | 7059.1377015896@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: queries with DISTINCT / GROUP BY giving different plans (Tomas Vondra <tv@fuzzy.cz>) |
Ответы |
Re: queries with DISTINCT / GROUP BY giving different plans
|
Список | pgsql-performance |
Tomas Vondra <tv@fuzzy.cz> writes: > I've managed to get the data to a different machine, and I've spent some > time on debugging it. Great, thanks for looking into it! > It seems that the difference is in evaluating hashentrysize > [ choose_hashed_distinct omits hash_agg_entry_size() ] > but the hashentrysize size is 24 (choose_hashed_distinct) or 56 > (choose_hashed_grouping). This causes that while _distinct evaluates the > condition as false, and _grouping as true (and thus returns false). Hah. > Now, the difference between 24 and 56 is caused by hash_agg_entry_size. > It's called with numAggs=0 but returns 32. I'm wondering if it should > return 0 in such cases, i.e. something like this: No, I don't think so. I'm pretty sure the reason choose_hashed_distinct is like that is that I subconsciously assumed hash_agg_entry_size would produce zero for numAggs = 0; but in fact it does not and should not, because there's still some overhead for the per-group hash entry whether or not there's any aggregates. So the right fix is that choose_hashed_distinct should add hash_agg_entry_size(0) onto its hashentrysize estimate. A separate issue is that the use of numAggs-1 in hash_agg_entry_size's calculations seems a bit risky if numAggs can be zero - I'm not sure we can rely on compilers to get that right. I'm inclined to replace that with use of offsetof. Likewise in build_hash_table. > I've tested that after this both queries use HashAggregate (which is the > right choice), but I haven't done any extensive checking so maybe I'm > missing something. It might be the preferable choice in this example, but you're looking at an edge case. If you want the thing to be using a hash aggregate for this size of problem, you should increase work_mem. regards, tom lane
В списке pgsql-performance по дате отправления: