Re: Combining Aggregates
От | Tomas Vondra |
---|---|
Тема | Re: Combining Aggregates |
Дата | |
Msg-id | 4c881a76-9682-6871-ddaa-4f58e0fd25b5@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: Combining Aggregates (David Rowley <david.rowley@2ndquadrant.com>) |
Список | pgsql-hackers |
Hi, On 03/20/2016 04:48 AM, David Rowley wrote: > On 17 March 2016 at 14:25, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> On 03/16/2016 12:03 PM, David Rowley wrote: >>> Patch 2: >>> This adds the serial/deserial aggregate infrastructure, pg_dump >>> support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it >>> serialise and deserialise aggregate states when instructed to do so. >>> >>> Patch 3: >>> This adds a boat load of serial/deserial functions, and combine >>> functions for most of the built-in numerical aggregate functions. It >>> also contains some regression tests which should really be in patch 2, >>> but I with patch 2 there's no suitable serialisation or >>> de-serialisation functions to test CREATE AGGREGATE with. I think >>> having them here is ok, as patch 2 is quite useless without patch 3 >>> anyway. >> >> I don't see how you could move the tests into #2 when the functions are >> defined in #3? IMHO this is the right place for the regression tests. > > Yeah, but the CREATE AGGREGATE changes which are being tested in 0003 > were actually added in 0002. I think 0002 is the right place to test > the changes to CREATE AGGREGATE, but since there's a complete lack of > functions to use, then I've just delayed until 0003. > >>> Another thing to note about this patch is that I've gone and created >>> serial/de-serial functions for when PolyNumAggState both require >>> sumX2, and don't require sumX2. I had thought about perhaps putting an >>> extra byte in the serial format to indicate if a sumX2 is included, >>> but I ended up not doing it this way. I don't really want these serial >>> formats getting too complex as we might like to do fun things like >>> pass them along to sharded servers one day, so it might be nice to >>> keep them simple. >> >> >> Hmmm, I've noticed that while eyeballing the diffs, and I'm not >> sure if it's worth the additional complexity at this point. I mean, >> one byte is hardly going to make a measurable difference - we're >> probably wasting more than that due to alignment, for example. > > I don't think any alignment gets done here. Look at pq_sendint(). Did > you mean the complexity of having extra functions, or having the > extra byte to check in the deserial func? I haven't realized alignment does not apply here, but I still think the extra byte would be negligible in the bigger scheme of things. For example we're serializing the state into a bytea, which adds up to 4B header. So I don't think adding 1B would make any measurable difference. But that assumes adding the 1B header would actually make the code simpler. Now that I think about it, that may not the case - in the end you'd probably get a function with two branches, with about the same size as the two functions combined. So not really an improvement. >> As you've mentioned sharded servers - how stable should the >> serialized format be? I've assumed it to be transient, i.e. in the >> extreme case it might change after restarting a database - for the >> parallel aggregate that's clearly sufficient. >> >> But if we expect this to eventually work in a sharded environment, >> that's going to be way more complicated. I guess in these cases we >> could rely on implicit format versioning, via the major the >> version (doubt we'd tweak the format in a minor version anyway). >> >> I'm not sure the sharding is particularly useful argument at this >> point, considering we don't really know if the current format is >> actually a good match for that. > > Perhaps you're right. At this stage I've no idea if we'd want to > support shards on varying major versions. I think probably not, > since the node write functions might not be compatible with the node > read functions on the other server. OK, so let's ignore the sharded setup for now. > I've attached another series of patches: > > 0001: This is the latest Parallel Aggregate Patch, not intended for > review here, but is required for the remaining patches. This patch has > changed quite a bit from the previous one that I posted here, and the > remaining patches needed re-based due to those changes. > > 0002: Adds serial/de-serial function support to CREATE AGGREGATE, > contains minor fix-ups from last version. > > 0003: Adds various combine/serial/de-serial functions for the standard > set of aggregates, apart from most float8 aggregates. > > 0004: Adds regression tests for 0003 pg_aggregate.h changes. > > 0005: Documents to mention which aggregate functions support partial mode. > > 0006: Re-based version of Haribabu's floating point aggregate support, > containing some fixes by me. I went through the changes and found nothing suspicious, except maybe for the wording in one of the doc patches: All types apart from floating-point types It may not be entirely clear for the readers, but this does not include "numeric" data type, only float4 and float8. But I don't think this really matters unless we fail to commit the last patch adding functions even for those data types. Also, I think it's probably the right time to fix the failures in opr_sanity regression tests. After all, we'll only whitelist a bunch of serialize functions, so the tests will still detect any unexpected functions dealing with 'internal' data type. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: