Re: Parallel Aggregates for string_agg and array_agg

Поиск

Список

Период

Сортировка

От	Mark Dilger
Тема	Re: Parallel Aggregates for string_agg and array_agg
Дата	2 мая 2018 г. 00:09:39
Msg-id	1C2959D0-56F2-4067-B2AC-DF9A3B1D0FB5@gmail.com обсуждение исходный текст
Ответ на	Re: Parallel Aggregates for string_agg and array_agg (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Parallel Aggregates for string_agg and array_agg
Список	pgsql-hackers

Дерево обсуждения

> On Mar 27, 2018, at 7:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> On 27 March 2018 at 13:26, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>> synchronized_seqscans is another piece of precedent in the area, FWIW.
>
>> This is true. I guess the order of aggregation could be made more
>> certain if we remove the cost based optimiser completely, and just
>> rely on a syntax based optimiser.
>
> None of this is responding to my point.  I think the number of people
> who actually don't care about aggregation order for these aggregates
> is negligible, and none of you have argued against that; you've instead
> selected straw men to attack.

I frequently care about the order, but only to the extent that the order
is stable between aggregates of several different columns, along the lines
of:

select array_agg(a) AS x, array_agg(b) AS y
  from generate_a_b_func(foo);

I don't care which order the data is in, as long as x[i] and y[i] are
matched correctly.  It sounds like this patch would force me to write
that as, for example:

select array_agg(a order by a, b) AS x, array_agg(b order by a, b) AS y
  from generate_a_b_func(foo);

which I did not need to do before.  I would expect a performance regression
from the two newly required sorts.  So in that case I agree with Tom.

But I also agree with others that I want the parallel aggregation functionality.
Could we perhaps introduce some option for the aggregate to force it to be
stable?  Something like:

select array_agg(a order undisturbed) AS x, array_agg(b order undisturbed) AS y
  from generate_a_b_func(foo);

which would not perform an extra sort operation but would guarantee to not
disturb the pre-existing sort order coming from generate_a_b_func(foo)?

I don't care about the syntax / keywords in the example above.  I'm just
looking to get the benefits of the parallel aggregation when I don't care
about ordering while preserving the order for these cases where it matters.

mark

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel Aggregates for string_agg and array_agg