Обсуждение: Problems on NUMERIC
Hi,
sometimes it's good not to spend too much efford implementing
the final solution first. So for the NUMERIC.
First I wonder why the can_coerce... stuff is #if'd out of
parse_relation.c? For the NUMERIC type the
numeric(num,typmod) must be called if someone does an
INSERT INTO ... SELECT * FROM ...
But it isn't. It is only called when there are calculations
done on the columns. I also checked that for BPCHAR type and
it simply throws an ERROR if the target's length doesn't
match.
This might be easy to fix, but the other problem I have is a
bit more difficult.
When binary operators (add, sub, mul, div) are called, the
required precision of the result isn't known. And the coerce
function numeric(num,typmod) will only be called for the
final result. Now have the following situation:
CREATE TABLE t1 (id int4, annual_val numeric(20,4));
CREATE TABLE t2 (id int4, monthly_val numeric(24,8));
INSERT INTO t2 SELECT id, annual_val / '12' FROM t1;
A multiplication would have a maximum number of digits that
can appear after the decimal point. It is the sum of number
of digits present in the two operators. But not so for a
division.
If we want to implement NUMERIC with a real high precision
(maybe 4000 or more digits), there would currently be no
other chance than to do the division with the full ever
possible precision and then throw away most of the digits
when the result is assigned to the target column. Wasted
efford and more important MUCH WASTED CPU.
I can think of something like this:
On add/subtract the results precision after the decimal point
is the higher of the two operands.
On multiply the results precision after the decimal point is
the sum of the precisions of the two operands.
On divide the results precision after the decimal point is
like for mult or the double of the higher precision of the
two operands. Any other suggestions?
On the other hand it is possible to do it as
INSERT INTO t2 SELECT id, ROUND(annual_val,8) / '12' FROM t1;
How do other databases handle this problem. How is the
precision of a numeric result defined?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
> First I wonder why the can_coerce... stuff is #if'd out of
> parse_relation.c?
Oh! That looks like my style of #if FALSE, but I can't recall why it is
that way. Will look at it. Does it work to just substitute an #if TRUE?
Perhaps I had it disabled during debugging, but...
> How do other databases handle this problem. How is the
> precision of a numeric result defined?
I've enclosed some snippets from my SQL92 2nd Draft Standard doc. It
gives you a lot of latitude :)
- Tom
Syntax Rules
1) If the data type of both operands of a dyadic arithmetic opera- tor is exact numeric, then the data type of the
resultis exact numeric, with precision and scale determined as follows:
a) Let S1 and S2 be the scale of the first and second operands respectively.
b) The precision of the result of addition and subtraction is implementation-defined, and the scale is the maximum
ofS1 and S2.
c) The precision of the result of multiplication is implementation- defined, and the scale is S1 + S2.
d) The precision and scale of the result of division is implementation-defined.
<snip large amounts>
Whenever an exact or approximate numeric value is assigned to a data item or parameter representing an exact numeric
value,an approximation of its value that preserves leading significant dig- its after rounding or truncating is
representedin the data type of the target. The value is converted to have the precision and scale of the target. The
choiceof whether to truncate or round is implementation-defined.
An approximation obtained by truncation of a numerical value N for an <exact numeric type> T is a value V
representablein T such that N is not closer to zero than the numerical value of V and such that the absolute value of
thedifference between N and the numer- ical value of V is less than the absolute value of the difference between two
successivenumerical values representable in T.
An approximation obtained by rounding of a numerical value N for an <exact numeric type> T is a value V representable
inT such that the absolute value of the difference between N and the nu- merical value of V is not greater than half
theabsolute value of the difference between two successive numerical values repre- sentable in T. If there are more
thanone such values V, then it is implementation-defined which one is taken.
All numerical values between the smallest and the largest value, inclusive, representable in a given exact numeric
typehave an approximation obtained by rounding or truncation for that type; it is implementation-defined which other
numericalvalues have such approximations.
> > First I wonder why the can_coerce... stuff is #if'd out of
> > parse_relation.c?
It looks like the routine where the code appears cannot return a
modified expression, so I just placed this code in there as a marker.
Should be possible to add some code to get this feature working.
- Tom
>
> > First I wonder why the can_coerce... stuff is #if'd out of
> > parse_relation.c?
>
> Oh! That looks like my style of #if FALSE, but I can't recall why it is
> that way. Will look at it. Does it work to just substitute an #if TRUE?
> Perhaps I had it disabled during debugging, but...
Hmmm - elog(ERROR, "Type %s(%d) can be coerced to... looks
like debugging code for me. Maybe you wanted to elog(DEBUG...
?
>
> > How do other databases handle this problem. How is the
> > precision of a numeric result defined?
>
> I've enclosed some snippets from my SQL92 2nd Draft Standard doc. It
> gives you a lot of latitude :)
>
Thanks! That helps alot!
>
> d) The precision and scale of the result of division is
> implementation-defined.
I love those definitions :-)
So I'll make the display scale of a division
min( max(S1, S2), SLIMIT)
and the internal result scale
min( RMINIMUM, max(R1, R2) + 2, RLIMIT)
where S1 and S2 are the display scales of the two operands,
R1 and R2 are the internal present scales and SLIMIT, RLIMIT
is the implementation-defined maximum allowed scale at all
(what about 4000 for SLIMIT ?). The RMINIMUM is 8 to have
anything at least computed internal with 8 digits after the
decimal point (because the defaults for NUMERIC are precision
30 scale 6).
If then the result is assigned to another tuples attribute,
numeric(num,typmod) will be called and do the rounding with
the scale defined in typmod. If numeric_out(num) is called
for it, it will be output rounded to the above display scale.
With 'round(att1, 500) / att2' someone can then get the
result with 500 digits scale.
This way it is flexible enough but not to much wasted
computing is done.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
> First I wonder why the can_coerce... stuff is #if'd out of
> parse_relation.c? For the NUMERIC type the
> numeric(num,typmod) must be called if someone does an
>
> INSERT INTO ... SELECT * FROM ...
>
> But it isn't. It is only called when there are calculations
> done on the columns. I also checked that for BPCHAR type and
> it simply throws an ERROR if the target's length doesn't
> match.
Sorry, I'm having trouble thinking of a case which does not behave
properly with the existing types. I've tried inserting varchar(10)
columns into a varchar(1) column, I've tried inserting int columns into
float columns, etc etc. How are you getting handleTargetColname() /
checkTargetTypes() called where it is rejecting things?
It may be that splitting that attribute field into two pieces for
NUMERIC is opening a can of worms, since there are specific assumptions
about what that field means throughout the code :(
Maybe we should think about how to isolate the type-specific
interpretation of that attribute field into a type-specific handler
routine? Ooh, that sounds like a pain...
- Tom
>
> > First I wonder why the can_coerce... stuff is #if'd out of
> > parse_relation.c? For the NUMERIC type the
> > numeric(num,typmod) must be called if someone does an
> >
> > INSERT INTO ... SELECT * FROM ...
> >
> > But it isn't. It is only called when there are calculations
> > done on the columns. I also checked that for BPCHAR type and
> > it simply throws an ERROR if the target's length doesn't
> > match.
>
> Sorry, I'm having trouble thinking of a case which does not behave
> properly with the existing types. I've tried inserting varchar(10)
> columns into a varchar(1) column, I've tried inserting int columns into
> float columns, etc etc. How are you getting handleTargetColname() /
> checkTargetTypes() called where it is rejecting things?
pgsql=> create table t1 (a char(10));
CREATE
pgsql=> create table t2 (a char(4));
CREATE
pgsql=> insert into t2 select * from t1;
ERROR: Length of a is not equal to the length of target column a
pgsql=>
>
> It may be that splitting that attribute field into two pieces for
> NUMERIC is opening a can of worms, since there are specific assumptions
> about what that field means throughout the code :(
It doesn't produce any problems so far, only that the
function numeric(num,typmod) isn't called when doing a plain
INSERT ... SELECT. It is only called when comparisions where
performed in the SELECT clause of the INSERT on the numeric
attributes. But I need that call to force the rounding and
range check at INSERT time. Otherwise, the values in the
target table will output later with the scale of their
original source table, and that's wrong. Also it would be
possible to insert 1000.0 into a numeric(5,2) attribute, and
that shouldn't be.
Maybe I have to hook for NUMERIC there in parse_relation too.
Up to now I'm compiling the whole thing as loadable module.
I'll check it that's possible when moving it to the builtins.
But in general I think if there is a function with the same
name as a type, that take this type plus another int4
argument, this must be a range checker/padder/truncator or
the like and it should be called before values are assigned
to attributes.
>
> Maybe we should think about how to isolate the type-specific
> interpretation of that attribute field into a type-specific handler
> routine? Ooh, that sounds like a pain...
Noooooooooo
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
> > How are you getting handleTargetColname() /
> > checkTargetTypes() called where it is rejecting things?
OK, I'm not sure why the behavior is different if I explicitly specify
the columns (which of course I had for testing):
postgres=> insert into t2 select a from t1;
INSERT 0 0
postgres=> insert into t2 select * from t1;
ERROR: Length of 'a' is not equal to the length of target column 'a'
> It doesn't produce any problems so far, only that the
> function numeric(num,typmod) isn't called when doing a plain
> INSERT ... SELECT.
Hmm. Even when you explicitly specify the columns as I did in my example
above? I should be able to get the wildcard example to work sometime
before v6.5, and I *think* that the explicit cases should do what you
want. As a loadable module, your data type will only match itself for
type coersion, but that's what you want for now. When it is built in,
then you will be able to specify that it is higher or lower in a
heirarchy with, for example, int4 and float8.
> Maybe I have to hook for NUMERIC there in parse_relation too.
> Up to now I'm compiling the whole thing as loadable module.
> I'll check it that's possible when moving it to the builtins.
Unless you can't find a test case which does work for you, don't bother
looking at it; I'll pick it up some time soon.
> But in general I think if there is a function with the same
> name as a type, that take this type plus another int4
> argument, this must be a range checker/padder/truncator or
> the like and it should be called before values are assigned
> to attributes.
That's how it should work afaik, at least for variable-length types. Not
all types are checked for this conversion function...
- Tom