Re: Parallel Aggregates for string_agg and array_agg
От | Mark Dilger |
---|---|
Тема | Re: Parallel Aggregates for string_agg and array_agg |
Дата | |
Msg-id | 1C2959D0-56F2-4067-B2AC-DF9A3B1D0FB5@gmail.com обсуждение исходный текст |
Ответ на | Re: Parallel Aggregates for string_agg and array_agg (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Parallel Aggregates for string_agg and array_agg
|
Список | pgsql-hackers |
> On Mar 27, 2018, at 7:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <david.rowley@2ndquadrant.com> writes: >> On 27 March 2018 at 13:26, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >>> synchronized_seqscans is another piece of precedent in the area, FWIW. > >> This is true. I guess the order of aggregation could be made more >> certain if we remove the cost based optimiser completely, and just >> rely on a syntax based optimiser. > > None of this is responding to my point. I think the number of people > who actually don't care about aggregation order for these aggregates > is negligible, and none of you have argued against that; you've instead > selected straw men to attack. I frequently care about the order, but only to the extent that the order is stable between aggregates of several different columns, along the lines of: select array_agg(a) AS x, array_agg(b) AS y from generate_a_b_func(foo); I don't care which order the data is in, as long as x[i] and y[i] are matched correctly. It sounds like this patch would force me to write that as, for example: select array_agg(a order by a, b) AS x, array_agg(b order by a, b) AS y from generate_a_b_func(foo); which I did not need to do before. I would expect a performance regression from the two newly required sorts. So in that case I agree with Tom. But I also agree with others that I want the parallel aggregation functionality. Could we perhaps introduce some option for the aggregate to force it to be stable? Something like: select array_agg(a order undisturbed) AS x, array_agg(b order undisturbed) AS y from generate_a_b_func(foo); which would not perform an extra sort operation but would guarantee to not disturb the pre-existing sort order coming from generate_a_b_func(foo)? I don't care about the syntax / keywords in the example above. I'm just looking to get the benefits of the parallel aggregation when I don't care about ordering while preserving the order for these cases where it matters. mark
В списке pgsql-hackers по дате отправления: