NUMERIC type efficiency problem
От | Mark Butler |
---|---|
Тема | NUMERIC type efficiency problem |
Дата | |
Msg-id | 3AD68AEC.2E1613A9@middle.net обсуждение исходный текст |
Ответы |
Re: NUMERIC type efficiency problem
|
Список | pgsql-hackers |
I noticed the storage format for the numeric type is rather inefficient: typedef struct NumericData { int32 varlen; /* Variable size */ int16 n_weight; /* Weight of 1st digit */ uint16 n_rscale; /* Result scale */ uint16 n_sign_dscale; /* Sign + display scale */ unsignedchar n_data[1]; /* Digit data (2 decimal digits/byte) */ } NumericData; typedef NumericData *Numeric; Oracle uses a similar variable length format for all numeric types, and they document its storage requirement as 2 + (sig digits/2) bytes. One byte is used for the column length, one byte for the exponent, a variable number of bytes for the significant digits. A zero value uses two bytes total in Oracle, where in the current version of PostgreSQL it uses ten bytes. Given the pending demise of the money type, the remaining alternative is rather wasteful for use in large financial applications. Is there a reason why varlen has to be an int32? uint8 would be more than enough. The other three fields could be int8 as well. I do not understand why we need four header fields - a much more efficient decimal type could be implemented as follows: typedef struct DecimalData {int8 varlen; /* variable size */int8 d_sign_exponent; /* 1 bit sign, 7 bit exponent */int8 d_mantissa[1]; /* variable precision binary integer mantissa */ }; Value represented is (-1 ^ sign)*(mantissa)*(10 ^ exponent). This would be more space efficient than Oracle and would support precisions up to DECIMAL(63). Having a reasonable maximum precision would allow a fixed length internal representation which make processing *much* faster* by using binary arithmetic and eliminating the necessity to palloc() buffers for every temporary result. (Aside: Doesn't the current numeric type use up memory in a hurry in a large sum(numeric_column) query? - Or are all those digitbuf_free()'s actually being cleaned up? And shouldn't the type operator calling convention be changed to pass a result buffer so these palloc()'s could be mostly avoided? ) As an even faster, lower max precision alternative: typedef struct FastDecimalData {int64 fd_mantissa; int8 fd_sign;int8 fd_exponent; }; Value represented is (-1 ^ sign)*(mantissa)*(10 ^ exponent). This would support precisions up to DECIMAL(18). Intermediate results could be stored using a 128 bit format to avoid loss of precision. Any comments? - Mark Butler
В списке pgsql-hackers по дате отправления: