Use int8 for int4/int2 aggregate accumulators?
От | Tom Lane |
---|---|
Тема | Use int8 for int4/int2 aggregate accumulators? |
Дата | |
Msg-id | 24638.997133222@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Use int8 for int4/int2 aggregate accumulators?
|
Список | pgsql-hackers |
This was discussed on pgsql-general a little bit on 21-July, but the discussion died off without reaching a conclusion. I'd like to put out a concrete proposal and see if anyone has objections. 1. SUM() and AVG() for int2 and int4 inputs should accumulate the running sum as an INT8, not a NUMERIC, for speed reasons. INT8 seems large enough to avoid overflow in practical situations. The final output datatype of AVG() will still be NUMERIC, but the final output of SUM() will become INT8 for these two input types. 2. STDDEV() and VARIANCE() for int2 and int4 inputs will continue to use NUMERIC for accuracy and overflow reasons (accumulating sum(x^2) is much more prone to overflow than sum(x)). So will all these aggregates for INT8. 3. As a separate proposal, we could change COUNT()'s running counter and output datatype from INT4 to INT8. This would make it a little slower but effectively overflow-proof. All of these changes are within the latitude that the SQL92 spec affords (it just says that the output values are exact numeric with implementation-defined precision and scale). Issues to consider are: * On machines with no 8-byte-int C datatype, the accumulator would effectively be int4. This would make the behavior no worse than currently for COUNT(), and no worse than it was in 7.0 for SUM() and AVG(), so that doesn't bother me a whole lot. But it would be a new source of cross-platform behavioral differences. * Changing the output datatype of these operations --- especially COUNT --- might affect or even break applications. We got a few complaints, not many, about changing SUM() and AVG() from integer to NUMERIC output in 7.1. Changing SUM() to INT8 isn't likely to hurt anyone who survived that transition. But COUNT() is much more widely used and is more likely to affect people. Should we keep it at INT4 output to avoid compatibility problems? regards, tom lane
В списке pgsql-hackers по дате отправления: