Re: enable_incremental_sort changes query behavior
От | Tomas Vondra |
---|---|
Тема | Re: enable_incremental_sort changes query behavior |
Дата | |
Msg-id | 20201007223438.npgyo4qqhzkztdnd@development обсуждение исходный текст |
Ответ на | Re: enable_incremental_sort changes query behavior (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Wed, Oct 07, 2020 at 03:52:04PM -0400, Robert Haas wrote: >On Thu, Oct 1, 2020 at 9:03 AM James Coleman <jtc331@gmail.com> wrote: >> The plan (obtained by replacing the volatile function with a stable one): >> >> Unique >> -> Nested Loop >> -> Gather Merge >> Workers Planned: 2 >> -> Sort >> Sort Key: ref_0.i, (md5(ref_0.t)) >> -> Parallel Index Scan using ref_0_i_idx on ref_0 >> -> Function Scan on generate_series ref_1 >> >> Changing `md5(ref_0.t)` to `random()::text || ref_0.t` causes the error. > >As far as I remember this stuff, target lists for any nodes below the >top of the join tree were previously always just Var nodes. Any >expressions that needed to be computed were computed at the level of >the topmost join, before adding upper planner steps like Agg, >WindowAgg, Limit, etc. But, here you've got a complex expression -- >md5(ref_0.t) -- begin computed somewhere in the middle of the plan >tree so that the sort can be performed at that point. However, that >seems likely to break a bunch of assumptions all over the planner, >because, unless a lot of work and surgery has been done since I looked >at this last, that md5(ref_0.t) value is not going to "bubble up" to >higher levels of the plan through the tlists of the individual nodes. > >That poses a risk that it will be recomputed multiple times, which is >particularly bad for volatile functions because you might not always >get the same answer, but it seems like it might be bad even for >non-volatile functions, because it could be slow if the value is used >at multiple places in the query. It feels like the sort of thing that >might have broken some other assumptions, too, although I can't say >exactly what. I agree this might be a clear correctness issue for volatile functions. Not sure about stable functions, though - it's easy to construct cases where pushing the function down results in much faster execution. Consider this: create table t (id int, val text); insert into t select 1, repeat('x', 1000) from generate_series(1,100) s(i); -- no pushdown select id, md5(t1.val) from t t1 join t t2 using (id) join t t3 using (id); Time: 5222.895 ms (00:05.223) -- manual pushdown with x as (select id, md5(t.val) from t) select id, t1.md5 from x t1 join x t2 using (id) join x t3 using (id); Time: 486.455 ms Of course, this is independent of what the planner assumes correctly, which is what's being discussed in this thread. And I'm sure there are plenty counter-examples too, so we'd need to cost this properly. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: