Re: [HACKERS] Parallel Aggregation support for aggregate functionsthat use transitions not implemented for array_agg
От | Tomas Vondra |
---|---|
Тема | Re: [HACKERS] Parallel Aggregation support for aggregate functionsthat use transitions not implemented for array_agg |
Дата | |
Msg-id | 269bca9e-9248-2d22-82be-6e82bbc101b3@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg ("Regina Obe" <lr@pcorp.us>) |
Список | pgsql-hackers |
Hi, On 6/7/17 5:52 AM, Regina Obe wrote: >> On 6/6/17 13:52, Regina Obe wrote: >>> It seems CREATE AGGREGATE was expanded in 9.6 to support >>> parallelization of aggregate functions using transitions, with the >>> addition of serialfunc and deserialfunc to the aggregate definitions. >>> >>> https://www.postgresql.org/docs/10/static/sql-createaggregate.html >>> >>> I was looking at the PostgreSQL 10 source code for some example usages >>> of this and was hoping that array_agg and string_agg would support the feature. > >> I'm not sure how you would parallelize these, since in most uses >> you want to have a deterministic output order. > >> -- >> Peter Eisentraut http://www.2ndQuadrant.com/ >> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > Good point. If that's the reason it wasn't done, that's good just wasn't sure. > > But if you didn't have an ORDER BY in your aggregate usage, and you > did have those transition functions, it shouldn't be any different from > any other use case right? > I imagine you are right that most folks who use array_agg and > string_agg usually combine it with array_agg(... ORDER BY ..) > I think that TL had in mind is something like SELECT array_agg(x) FROM ( SELECT x FROM bar ORDER BY y ) foo; i.e. a subquery producing the data in predictable order. > > My main reason for asking is that most of the PostGIS geometry and > raster aggregate functions use transitions and were patterned after > array agg. > > In the case of PostGIS the sorting is done internally and really > only to expedite take advantage of things like cascaded union > algorithms. > That is always done though (so even if each worker does it on just it's > batch that's still better than having only one worker). > So I think it's still very beneficial to break into separate jobs > since in the end the gather, will have say 2 biggish geometries or 2 > biggish rasters to union if you have 2 workers which is still better > than having a million smallish geometries/rasters to union I'm not sure I got your point correctly, but if you can (for example) sort the per-worker results as part of the "serialize" function, and benefit from that while combining that in the gather, then sure, that should be a huge win. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: