Обсуждение: Typmod associated with multi-row VALUES constructs

Поиск
Список
Период
Сортировка

Typmod associated with multi-row VALUES constructs

От
Tom Lane
Дата:
I looked into the issue reported in bug #14448,
https://www.postgresql.org/message-id/20161205143037.4377.60754%40wrigleys.postgresql.org

The core of it seems to be that expandRTE() will report the type and
typmod of a column of a VALUES construct as being exprType() and
exprTypmod() of the corresponding expression in the first row of
the VALUES.  It's okay to handle data type that way, because we've
coerced all the expressions for the column to the same type; but
we have *not* coerced them to the same typmod.  So some of the values
from later rows may fail to meet the claimed typmod.  This is not good.

In order to fix this, we first have to decide what the semantics ought
to be.  I think there are two plausible definitions:

1. If all the expressions in the VALUES column share the same typmod,
use that typmod, else use -1.

2. Use -1 whenever there is more than one VALUES row.

#1 is what we do for some comparable cases such as UNION and CASE.
However, it's potentially quite expensive for large VALUES constructs.
#2 would be a lot cheaper, and given that this is the first complaint
we've gotten in all the years we've had multi-row-VALUES support, it's
not clear that deriving a precise typmod is really all that useful
for VALUES.

I have no strong preference between these two, but I think whatever
we do needs to be back-patched.  The behavior described in the bug
report is definitely broken.

Thoughts?
        regards, tom lane



Re: Typmod associated with multi-row VALUES constructs

От
"David G. Johnston"
Дата:
On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I looked into the issue reported in bug #14448,
https://www.postgresql.org/message-id/20161205143037.4377.60754%40wrigleys.postgresql.org

The core of it seems to be that expandRTE() will report the type and
typmod of a column of a VALUES construct as being exprType() and
exprTypmod() of the corresponding expression in the first row of
the VALUES.  It's okay to handle data type that way, because we've
coerced all the expressions for the column to the same type; but
we have *not* coerced them to the same typmod.  So some of the values
from later rows may fail to meet the claimed typmod.  This is not good.

In order to fix this, we first have to decide what the semantics ought
to be.  I think there are two plausible definitions:

1. If all the expressions in the VALUES column share the same typmod,
use that typmod, else use -1.

2. Use -1 whenever there is more than one VALUES row.

#1 is what we do for some comparable cases such as UNION and CASE.
However, it's potentially quite expensive for large VALUES constructs.
#2 would be a lot cheaper, and given that this is the first complaint
we've gotten in all the years we've had multi-row-VALUES support, it's
not clear that deriving a precise typmod is really all that useful
for VALUES.

I have no strong preference between these two, but I think whatever
we do needs to be back-patched.  The behavior described in the bug
report is definitely broken.

Thoughts?

​Can we be precise enough to perform #2 if the top-level (or immediate parent) command is an INSERT - the existing table is going to enforce its own typemod anyway, otherwise go with #1?

​Lacking that possibility I'd say that documenting that our treatment of typemod in VALUES is similar to our treatment of typemod in function arguments would be acceptable. This suggests a #3 - simply use "-1" regardless of the number of rows in the VALUES expression.

David J.

Re: Typmod associated with multi-row VALUES constructs

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In order to fix this, we first have to decide what the semantics ought
>> to be.  I think there are two plausible definitions:
>> 1. If all the expressions in the VALUES column share the same typmod,
>> use that typmod, else use -1.
>> 2. Use -1 whenever there is more than one VALUES row.

> ​Can we be precise enough to perform #2 if the top-level (or immediate
> parent) command is an INSERT - the existing table is going to enforce its
> own typemod anyway, otherwise go with #1?

I dunno if that's "precise" or just "randomly inconsistent" ;-)

> ​Lacking that possibility I'd say that documenting that our treatment of
> typemod in VALUES is similar to our treatment of typemod in function
> arguments would be acceptable. This suggests a #3 - simply use "-1"
> regardless of the number of rows in the VALUES expression.

I'm a bit concerned about whether that would introduce overhead that we
avoid today, in particular for something like

insert into foo (varchar20col) values ('bar'::varchar(20));

I think if we throw away the knowledge that the VALUES row produces the
right typmod already, we'd end up adding an unnecessary runtime coercion
step.
        regards, tom lane



Re: Typmod associated with multi-row VALUES constructs

От
"David G. Johnston"
Дата:
On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In order to fix this, we first have to decide what the semantics ought
>> to be.  I think there are two plausible definitions:
>> 1. If all the expressions in the VALUES column share the same typmod,
>> use that typmod, else use -1.
>> 2. Use -1 whenever there is more than one VALUES row.

> ​Can we be precise enough to perform #2 if the top-level (or immediate
> parent) command is an INSERT - the existing table is going to enforce its
> own typemod anyway, otherwise go with #1?

I dunno if that's "precise" or just "randomly inconsistent" ;-)

:) 

How does "targeted optimization" sound?


> ​Lacking that possibility I'd say that documenting that our treatment of
> typemod in VALUES is similar to our treatment of typemod in function
> arguments would be acceptable. This suggests a #3 - simply use "-1"
> regardless of the number of rows in the VALUES expression.

I'm a bit concerned about whether that would introduce overhead that we
avoid today, in particular for something like

insert into foo (varchar20col) values ('bar'::varchar(20));

I think if we throw away the knowledge that the VALUES row produces the
right typmod already, we'd end up adding an unnecessary runtime coercion
step.

​Unnecessary maybe, but wouldn't it be immaterial given we are only able to be efficient when inserting exactly one row.

There is also a #4 here to consider - if the first (or any) row is not type unknown, and the remaining rows are all unknown, use the type and typemod of the known row AND attempt coerce all of the unknowns to that same type.  I'd suggest this is probably the most user-friendly option (do as I mean, not as I say).  The OP query would then fail since the second literal is too long to fit in a varchar(20) - I would not want the value truncated so an actual cast wouldn't work.

David J.


David J.

Re: Typmod associated with multi-row VALUES constructs

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" <david.g.johnston@gmail.com> wrote in
<CAKFQuwZXyyPLaO0wyn94WihcjZCUsv8nr0FsCFrQ=oO1DkpBuA@mail.gmail.com>
> On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> In order to fix this, we first have to decide what the semantics ought
> > >> to be.  I think there are two plausible definitions:
> > >> 1. If all the expressions in the VALUES column share the same typmod,
> > >> use that typmod, else use -1.
> > >> 2. Use -1 whenever there is more than one VALUES row.
> >
> > > ​Can we be precise enough to perform #2 if the top-level (or immediate
> > > parent) command is an INSERT - the existing table is going to enforce its
> > > own typemod anyway, otherwise go with #1?
> >
> > I dunno if that's "precise" or just "randomly inconsistent" ;-)
> >
> 
> :)
> 
> How does "targeted optimization" sound?

(sorry I don't understand what the "targetted optimization" is..)

FWIW, different from the UNION case, I don't see a reason that
every row in a VALUES clause shares anything common with any
other rows. Of course typmod and even type are not to be
shared. (Type is shared, though.)

On the other hand, if we make all values to be strictly typed (I
mean that every value brings its own type information along
with), values also can consider strict type. But currently the
following command is ignoring the type of the first value.

=# select 'bar'::varchar(4) || 'eeee';?column? 
----------bareeee


> > > ​Lacking that possibility I'd say that documenting that our treatment of
> > > typemod in VALUES is similar to our treatment of typemod in function
> > > arguments would be acceptable. This suggests a #3 - simply use "-1"
> > > regardless of the number of rows in the VALUES expression.
> >
> > I'm a bit concerned about whether that would introduce overhead that we
> > avoid today, in particular for something like
> >
> > insert into foo (varchar20col) values ('bar'::varchar(20));
> >
> > I think if we throw away the knowledge that the VALUES row produces the
> > right typmod already, we'd end up adding an unnecessary runtime coercion
> > step.

Is it means that something like this?
insert into foo (varchar20col)   select a::varchar(20) from (values ('barrrrrrrrrrrrrrrrrrrrr')) as a;


Even though I'm not sure about SQL standard here but my
feeling is something like the following.

| FROM (
|   VALUES (row 1), .. (row n))
| AS foo (colname *type*, ..)

for this case, 

| create temporary table product_codes  as select *
| from (
|     values
|     ('abcdefg'),
|     ('012345678901234567ABCDEFGHIJKLMN')
| ) csv_data (product_code character varying(20));

Myself have gotten errors for this false syntax several times:(

> ​Unnecessary maybe, but wouldn't it be immaterial given we are only able to
> be efficient when inserting exactly one row.
> 
> There is also a #4 here to consider - if the first (or any) row is not type
> unknown, and the remaining rows are all unknown, use the type and typemod
> of the known row AND attempt coerce all of the unknowns to that same type.
> I'd suggest this is probably the most user-friendly option (do as I mean,
> not as I say).  The OP query would then fail since the second literal is
> too long to fit in a varchar(20) - I would not want the value truncated so
> an actual cast wouldn't work.

1 has a type of int, 1.0 has a type of float and '1' has a type
of text. So I don't see a situation where only the first row is
detectably typed. Or is it means that only the first row is
explicitly typed? I agree that it would be an option but I prefer
the above syntax (#5?) instead for the same purpose.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Typmod associated with multi-row VALUES constructs

От
Craig Ringer
Дата:
On 6 December 2016 at 04:52, David G. Johnston
<david.g.johnston@gmail.com> wrote:

> Can we be precise enough to perform #2 if the top-level (or immediate
> parent) command is an INSERT - the existing table is going to enforce its
> own typemod anyway, otherwise go with #1?

We already routinely throw away typmod information in other places,
and can only truly rely on it when working directly with data in
tables. So it sounds sensible to me.

I see semi-regular complaints about this from JDBC users, who want to
be able to know things like "number of digits in this numeric" and
"can this column be null" even through various projections and
transformations of results. But I don't think your suggested change
will make the existing situation any worse.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Typmod associated with multi-row VALUES constructs

От
"David G. Johnston"
Дата:
On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,

At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" <david.g.johnston@gmail.com> wrote in <CAKFQuwZXyyPLaO0wyn94WihcjZCUsv8nr0FsCFrQ=oO1DkpBuA@mail.gmail.com>
> On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> In order to fix this, we first have to decide what the semantics ought
> > >> to be.  I think there are two plausible definitions:
> > >> 1. If all the expressions in the VALUES column share the same typmod,
> > >> use that typmod, else use -1.
> > >> 2. Use -1 whenever there is more than one VALUES row.
> >
> > > ​Can we be precise enough to perform #2 if the top-level (or immediate
> > > parent) command is an INSERT - the existing table is going to enforce its
> > > own typemod anyway, otherwise go with #1?
> >
> > I dunno if that's "precise" or just "randomly inconsistent" ;-)
> >
>
> :)
>
> How does "targeted optimization" sound?

(sorry I don't understand what the "targetted optimization" is..)

I guess its a bit misleading depending on the implementation chosen.  The general thought is that we simply ignore typemod information in VALUES if we have been told otherwise what that typemod will be (in this case an insert into column will use the typemod of the existing column regardless of the input data's typemod).


FWIW, different from the UNION case, I don't see a reason that
every row in a VALUES clause shares anything common with any
other rows. Of course typmod and even type are not to be
shared. (Type is shared, though.)

​You have a typo here somewhere..."type not to be shared. (Type is shared, though)" doesn't work.​

On the other hand, if we make all values to be strictly typed (I
mean that every value brings its own type information along
with), values also can consider strict type. But currently the
following command is ignoring the type of the first value.

=# select 'bar'::varchar(4) || 'eeee';
 ?column?
----------
 bareeee


​Its not ignored - is discarded during coercion to make the || operator work on the same type (text).  ​

​SELECT '12345'::varchar(3) || '678'​


Even though I'm not sure about SQL standard here but my
feeling is something like the following.

| FROM (
|   VALUES (row 1), .. (row n))
| AS foo (colname *type*, ..)

for this case,

| create temporary table product_codes  as select *
| from (
|       values
|       ('abcdefg'),
|       ('012345678901234567ABCDEFGHIJKLMN')
| ) csv_data (product_code character varying(20));

Myself have gotten errors for this false syntax several times:(

​Only the function in from form of this accepts a column definition in lieu of a simple alias.  Regardless of the merits of this idea it would not be backpatch-able and wouldn't resolve the current valid syntax problem.​

I suppose my option #4 has the same problem...


> ​Unnecessary maybe, but wouldn't it be immaterial given we are only able to
> be efficient when inserting exactly one row.
>
> There is also a #4 here to consider - if the first (or any) row is not type
> unknown, and the remaining rows are all unknown, use the type and typemod
> of the known row AND attempt coerce all of the unknowns to that same type.
> I'd suggest this is probably the most user-friendly option (do as I mean,
> not as I say).  The OP query would then fail since the second literal is
> too long to fit in a varchar(20) - I would not want the value truncated so
> an actual cast wouldn't work.

1 has a type of int,
 
​true​

1.0 has a type of float
 
​false - its numeric​
​select pg_typeof(1.0) -> numeric

and '1' has a typ
​e ​
of text.
 
​false - its unknown​ (but implicitly cast-able)
select pgtype_of('1') -> error, pg_typeof(unknown) does not exist

So I don't see a situation where only the first row is
detectably typed. Or is it means that only the first row is
explicitly typed?

​I do indeed mean explicitly typed here.​  Using 1 or 1.0 in a values would be a form of explicit typing in this sense.

David J.

Re: Typmod associated with multi-row VALUES constructs

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" <david.g.johnston@gmail.com> wrote in
<CAKFQuwYXzBQNpH5L=AHJzOjOZCZSzRvF9qiA0wwt_QZmAuYmEA@mail.gmail.com>
> On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> > Hello,
> >
> > At Mon, 5 Dec 2016 14:42:39 -0700, "David G. Johnston" <
> > david.g.johnston@gmail.com> wrote in <CAKFQuwZXyyPLaO0wyn94WihcjZCUs
> > v8nr0FsCFrQ=oO1DkpBuA@mail.gmail.com>
> > > On Mon, Dec 5, 2016 at 2:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > > > On Mon, Dec 5, 2016 at 1:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > > >> In order to fix this, we first have to decide what the semantics
> > ought
> > > > >> to be.  I think there are two plausible definitions:
> > > > >> 1. If all the expressions in the VALUES column share the same
> > typmod,
> > > > >> use that typmod, else use -1.
> > > > >> 2. Use -1 whenever there is more than one VALUES row.
> > > >
> > > > > ​Can we be precise enough to perform #2 if the top-level (or
> > immediate
> > > > > parent) command is an INSERT - the existing table is going to
> > enforce its
> > > > > own typemod anyway, otherwise go with #1?
> > > >
> > > > I dunno if that's "precise" or just "randomly inconsistent" ;-)
> > > >
> > >
> > > :)
> > >
> > > How does "targeted optimization" sound?
> >
> > (sorry I don't understand what the "targetted optimization" is..)
> >
> 
> I guess its a bit misleading depending on the implementation chosen.  The
> general thought is that we simply ignore typemod information in VALUES if
> we have been told otherwise what that typemod will be (in this case an
> insert into column will use the typemod of the existing column regardless
> of the input data's typemod).

I think I understand that. Fitting the source in VALUES into the
destination column types. That sounds reasonable.

> > FWIW, different from the UNION case, I don't see a reason that
> > every row in a VALUES clause shares anything common with any
> > other rows. Of course typmod and even type are not to be
> > shared. (Type is shared, though.)
> >
> 
> ​You have a typo here somewhere..."type not to be shared. (Type is shared,
> though)" doesn't work.​

(It's not typo but my poor wording... Sorry.)
Mmm. I think the typemod of a row should not be applied onto
other rows, and the same can be said for types. But type of the
first row is applied to all of the rest rows, though... Does it
make sense?

> > On the other hand, if we make all values to be strictly typed (I
> > mean that every value brings its own type information along
> > with), values also can consider strict type. But currently the
> > following command is ignoring the type of the first value.
> >
> > =# select 'bar'::varchar(4) || 'eeee';
> >  ?column?
> > ----------
> >  bareeee
> >
> >
> ​Its not ignored - is discarded during coercion to make the || operator
> work on the same type (text).  ​
> 
> ​SELECT '12345'::varchar(3) || '678'​

Hmm, sorry. Coercion seems happen in the direction to the more
"free" (or robust?) side. The results of both int + float and
float + int are double precision.

But what I wanted to say was not that but the something like the
following.

select pg_typeof('12345'::varchar(1));    pg_typeof     
-------------------character varying

or 

=# select pg_typeof('12345'::numeric(6, 1));pg_typeof 
-----------numeric

A value seemingly doesn't have typmod. So it seems reasonable
that VALUES cannot handle typmod. It seems reasonable regardless
of how it is acutually implemented.

> > Even though I'm not sure about SQL standard here but my
> > feeling is something like the following.
> >
> > | FROM (
> > |   VALUES (row 1), .. (row n))
> > | AS foo (colname *type*, ..)
> >
> > for this case,
> >
> > | create temporary table product_codes  as select *
> > | from (
> > |       values
> > |       ('abcdefg'),
> > |       ('012345678901234567ABCDEFGHIJKLMN')
> > | ) csv_data (product_code character varying(20));
> >
> > Myself have gotten errors for this false syntax several times:(
> >
> 
> ​Only the function in from form of this accepts a column definition in lieu
> of a simple alias.  Regardless of the merits of this idea it would not be
> backpatch-able and wouldn't resolve the current valid syntax problem.​

Yeah.. It wouldn't be back-patchable. I personally think that it
is not necessary to be "solve"d, since a value doesn't rigged
with typmod.

But I undetstand what we want to solve here is how to change
VALUES's behavior to perfectly coerce the row values to the types
explicity applied in the first row. Right?

> I suppose my option #4 has the same problem...
> 
> 
> > > ​Unnecessary maybe, but wouldn't it be immaterial given we are only able
> > to
> > > be efficient when inserting exactly one row.
> > >
> > > There is also a #4 here to consider - if the first (or any) row is not
> > type
> > > unknown, and the remaining rows are all unknown, use the type and typemod
> > > of the known row AND attempt coerce all of the unknowns to that same
> > type.
> > > I'd suggest this is probably the most user-friendly option (do as I mean,
> > > not as I say).  The OP query would then fail since the second literal is
> > > too long to fit in a varchar(20) - I would not want the value truncated
> > so
> > > an actual cast wouldn't work.
> >
> and '1' has a typ
> > ​e ​
> > of text.
> 
> ​false - its unknown​ (but implicitly cast-able)

Sorry, I wrote it uncarefully.

> ​I do indeed mean explicitly typed here.​  Using 1 or 1.0 in a values would
> be a form of explicit typing in this sense.

Thanks, I understand that.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Typmod associated with multi-row VALUES constructs

От
"David G. Johnston"
Дата:
feel free to s/typemod/typmod/ ... my fingers don't want to drop the "e"

On Mon, Dec 5, 2016 at 9:17 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,

At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" <david.g.johnston@gmail.com> wrote in <CAKFQuwYXzBQNpH5L=AHJzOjOZCZSzRvF9qiA0wwt_QZmAuYmEA@mail.gmail.com>
> On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>

(It's not typo but my poor wording... Sorry.)
Mmm. I think the typemod of a row should not be applied onto
other rows, and the same can be said for types. But type of the
first row is applied to all of the rest rows, though... Does it
make sense?

Yes.  ​All rows in a given relation must end up with the exact same type and it isn't friendly to fail when a implicit conversion from unknown is possible.


But what I wanted to say was not that but the something like the
following.

select pg_typeof('12345'::varchar(1));
     pg_typeof
-------------------
 character varying

A value seemingly doesn't have typmod. So it seems reasonable
that VALUES cannot handle typmod. It seems reasonable regardless
of how it is acutually implemented.

​This is an artifact of functions - the typemod associated with the value '12345' is lost when that value is passed into the function pg_typeof.  Thus it is impossible to write a SQL query the reports the typemod of a literal or column reference.  Nonetheless it is there in reality.  Just see the original CREATE TABLE AS example for proof.  The created table's column is shown (using direct catalog queries) to contain typemod value of ​20 - which it could only have gotten from the first values rows which contained a casted literal.


> > Even though I'm not sure about SQL standard here but my
> > feeling is something like the following.
> >
> > | FROM (
> > |   VALUES (row 1), .. (row n))
> > | AS foo (colname *type*, ..)
> >
> > for this case,
> >
> > | create temporary table product_codes  as select *
> > | from (
> > |       values
> > |       ('abcdefg'),
> > |       ('012345678901234567ABCDEFGHIJKLMN')
> > | ) csv_data (product_code character varying(20));
> >
> > Myself have gotten errors for this false syntax several times:(
> >
>
> ​Only the function in from form of this accepts a column definition in lieu
> of a simple alias.  Regardless of the merits of this idea it would not be
> backpatch-able and wouldn't resolve the current valid syntax problem.​

Yeah.. It wouldn't be back-patchable. I personally think that it
is not necessary to be "solve"d, since a value doesn't rigged
with typmod.

But I undetstand what we want to solve here is how to change
VALUES's behavior to perfectly coerce the row values to the types
explicity applied in the first row. Right?

​Actually, no, since it is not possible to coerce "perfectly".  Since any attempt at doing so could fail it is only possible to scan every row and compare its typemod to the first row's typemod.  Ideally (but maybe not in reality) as soon as a discrepancy is found stop and discard the typemod.  If every row passes you can retain the typemod.  That is arguably the ​perfect solution.  The concern is that "scan every row" could be very expensive - though in writing this I'm thinking that you'd quickly find a non-match even in a large dataset - and so a less perfect but still valid solution is to simply discard the typemod if there is more than one row.  My thought was that if you are going to discard typemod in the n > 1 case for consistency you should discard the typemod in the n = 1 case as well.

That is, in a nutshell, options 1, 2, and 3 in order.

The "fault" in #1 that #4 attempted to fix was that VALUES are often hand entered and so the inexperienced would like to only type in a cast one the first row and have it will apply to all subsequent rows. That would be a feature request, though.  Your capability to add type information to any FROM alias is likewise a feature request - solving the overall problem by giving the author a place to specify the desired type explicitly without having to pollute the query with excessive casts or subqueries.

David J.


Re: Typmod associated with multi-row VALUES constructs

От
"David G. Johnston"
Дата:
On Mon, Dec 5, 2016 at 9:54 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
 The concern is that "scan every row" could be very expensive - though in writing this I'm thinking that you'd quickly find a non-match even in a large dataset - and so a less perfect but still valid solution is to simply discard the typemod if there is more than one row. 

​My folly here - and the actual question to ask - is if you are faced with large dataset that does have consistent typmods - is the benefit of knowing what it is and carrying it to the next layer worth the cost of scanning every single row to prove it is consistent?​

My vote would be no - and the only question to ask is whether n = 1 and n > 1 behaviors should differ - to which I'd say no as well - at least in master.  In the back branches the current behavior would be retained if the n = 1 behavior is kept different than the n > 1 behavior which is a worthy trade-off.

David J.

Re: Typmod associated with multi-row VALUES constructs

От
Kyotaro HORIGUCHI
Дата:
Hello,

At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" <david.g.johnston@gmail.com> wrote in
<CAKFQuwYOeesXJ1bH31b2MKx1UStEzrpYe=tSAO2-eg1Ai4=Eww@mail.gmail.com>
> feel free to s/typemod/typmod/ ... my fingers don't want to drop the "e"
> 
> On Mon, Dec 5, 2016 at 9:17 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> > Hello,
> >
> > At Mon, 5 Dec 2016 18:59:42 -0700, "David G. Johnston" <
> > david.g.johnston@gmail.com> wrote in <CAKFQuwYXzBQNpH5L=AHJzOjOZCZS
> > zRvF9qiA0wwt_QZmAuYmEA@mail.gmail.com>
> > > On Mon, Dec 5, 2016 at 6:36 PM, Kyotaro HORIGUCHI <
> > > horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > >
> >
> > (It's not typo but my poor wording... Sorry.)
> > Mmm. I think the typemod of a row should not be applied onto
> > other rows, and the same can be said for types. But type of the
> > first row is applied to all of the rest rows, though... Does it
> > make sense?
> >
> 
> Yes.  ​All rows in a given relation must end up with the exact same type
> and it isn't friendly to fail when a implicit conversion from unknown is
> possible.

I'm not sure how clearly users are conscious of the type unkown,
my feeling is opposed to implicity applying of the type of the
first value in VALUES to all other values of the type "unknown".

On the other hand, such behavior doesn't harm anything when we
don't explicitly type the values in VALUES. (But it would be a
new feature as you wrote below)

Anyway the real behavior of the current parser for VALUES is
scanning whole the list and extract common-coerceable type, then
applying the type on the whole list. (transformValuesClause)

> > But what I wanted to say was not that but the something like the
> > following.
> >
> > select pg_typeof('12345'::varchar(1));
> >      pg_typeof
> > -------------------
> >  character varying
> >
> > A value seemingly doesn't have typmod. So it seems reasonable
> > that VALUES cannot handle typmod. It seems reasonable regardless
> > of how it is acutually implemented.
> >
> 
> ​This is an artifact of functions - the typemod associated with the value
> '12345' is lost when that value is passed into the function pg_typeof.

It seems to me what is occuring in VALUES.

> Thus it is impossible to write a SQL query the reports the typemod of a
> literal or column reference.  Nonetheless it is there in reality.  Just see
> the original CREATE TABLE AS example for proof.  The created table's column
> is shown (using direct catalog queries) to contain typemod value of ​20 -
> which it could only have gotten from the first values rows which contained
> a casted literal.

Ok, I found myself have read the original problem wrongly.

=# create table t2 (a) as select * from (values ('abcdee'::varchar(3)), ('defghij'::varchar(5))) x;
postgres=# \d t2;            Table "public.t2"Column |         Type         | Modifiers 
--------+----------------------+-----------a      | character varying(3) | 
postgres=# select * from t2;  a   
-------abcdefgh
(2 rows)

The problem to be resolved here seems to be that CREATE TABLE AS
creates a broken in-a-sense table. Not a coercion of VALUES.

#6 - raise an error if a subquery's result doesn't fit the newly    created table. Or, create a new table so that all
thevalue    given from subqery fit to it. (I havent' understand how the    source typmod affects the new table and I
dont'seehow to do    that, though)
 


> > But I undetstand what we want to solve here is how to change
> > VALUES's behavior to perfectly coerce the row values to the types
> > explicity applied in the first row. Right?
> >
> ​
> ​Actually, no, since it is not possible to coerce "perfectly".  Since any
> attempt at doing so could fail it is only possible to scan every row and
> compare its typemod to the first row's typemod.  Ideally (but maybe not in
> reality) as soon as a discrepancy is found stop and discard the typemod.


> If every row passes you can retain the typemod.  That is arguably the
> ​perfect solution.  The concern is that "scan every row" could be very
> expensive - though in writing this I'm thinking that you'd quickly find a

I found that it already scans the whole list. select_common_type
does that. And it returns the type that is found to be applicable
on all values. Handling typmod there has a small
impact. (Though, I don't assert that we should do this.)

Even the value itself doesn't convey typmod, VALUES can take
expressions. (It is obvious because the first value was already a
coercing expression). They are scanned already.

> non-match even in a large dataset - and so a less perfect but still valid
> solution is to simply discard the typemod if there is more than one row.
> My thought was that if you are going to discard typemod in the n > 1 case
> for consistency you should discard the typemod in the n = 1 case as well.
> 
> That is, in a nutshell, options 1, 2, and 3 in order.
> 
> The "fault" in #1 that #4 attempted to fix was that VALUES are often hand
> entered and so the inexperienced would like to only type in a cast one the
> first row and have it will apply to all subsequent rows. That would be a
> feature request, though.  Your capability to add type information to any
> FROM alias is likewise a feature request - solving the overall problem by
> giving the author a place to specify the desired type explicitly without
> having to pollute the query with excessive casts or subqueries.

Ok, I think all of the #1 to #5 are the change of behavior in
this criteria. What we shold resolve here is the inconsistent
table generated by create table as select from (values...

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Typmod associated with multi-row VALUES constructs

От
Tom Lane
Дата:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> At Mon, 5 Dec 2016 21:54:08 -0700, "David G. Johnston" <david.g.johnston@gmail.com> wrote in
<CAKFQuwYOeesXJ1bH31b2MKx1UStEzrpYe=tSAO2-eg1Ai4=Eww@mail.gmail.com>
>> If every row passes you can retain the typemod.  That is arguably the
>> ​perfect solution.  The concern is that "scan every row" could be very
>> expensive - though in writing this I'm thinking that you'd quickly find a

> I found that it already scans the whole list. select_common_type
> does that. And it returns the type that is found to be applicable
> on all values. Handling typmod there has a small
> impact. (Though, I don't assert that we should do this.)

The problem is that there is not where the impact is.  It would be really
easy for transformValuesClause to check for a common typmod while it's
transforming the construct, but it has noplace to put the information.
The place where the info is needed is in expandRTE and
get_rte_attribute_type (which is called by make_var).  So basically,
parsing of each Var reference to a VALUES subquery would potentially have
to scan all rows of the VALUES list to identify the right typmod to give
to the Var.  The repetitiveness of that is what's bothering me.

In HEAD, we could change the RTE data structure so that
transformValuesClause could save the typmod information in the RTE,
keeping the lookups cheap.  But that option is not available to us in
released branches.

On the other hand, we might be worrying over nothing.  I think that
in typical use, expandRTE() will be done just once per query, and it
could amortize the cost by considering all the rows in one scan.
get_rte_attribute_type would be a problem except that I believe it's
rarely used for VALUES RTEs --- our code coverage report shows that that
switch branch isn't reached at all in the standard regression tests.
(The reason for the above statements is that we convert a bare VALUES to
SELECT * FROM VALUES, and the * is expanded by expandRTE, not by retail
applications of make_var.)

> Ok, I think all of the #1 to #5 are the change of behavior in
> this criteria. What we shold resolve here is the inconsistent
> table generated by create table as select from (values...

Well, that's one symptom, but not the only one; consider

# select x::varchar(4) from (values ('z'::varchar(4)), ('0123456789')) v(x);    x
------------z0123456789
(2 rows)

The Var representing v.x is (mis)labeled as already having the right typmod
for varchar(4), so transformTypeCast concludes that no run-time work is
needed, and you get clearly-wrong answers out.

Still, things have been like this since 8.2 when we implemented multi-row
VALUES, and nobody's noticed up to now.  Maybe the right answer is to
change the data structure in HEAD and decree that we won't fix it in back
branches.  I don't really like that answer though ...
        regards, tom lane



Re: Typmod associated with multi-row VALUES constructs

От
"David G. Johnston"
Дата:
On Wed, Dec 7, 2016 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Still, things have been like this since 8.2 when we implemented multi-row
VALUES, and nobody's noticed up to now.  Maybe the right answer is to
change the data structure in HEAD and decree that we won't fix it in back
branches.  I don't really like that answer though ...

​The merit of "won't back-patch"​ is that at least you don't punish those who are being lazy (in a good sense) but generating values in subsequent lines that conform to the type specification of the first record.  We already implicitly accept such behavior elsewhere - though probably with better validation - so keeping it here is defense-able.  We dislike changing query plans in back branches and this really isn't that different.

The concern, especially since this can propagate to a CREATE TABLE AS, is whether there is some kind of fundamental storage risk being introduced that we do not want to have happen no matter how rare.  /me feels deja-vu...

David J.


Re: Typmod associated with multi-row VALUES constructs

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> In HEAD, we could change the RTE data structure so that
> transformValuesClause could save the typmod information in the RTE,
> keeping the lookups cheap.

Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
a bit about it at
https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql

The patch has evolved quite a bit since then, but the tupdesc continues
to be a problem.  Latest patch is at
https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Typmod associated with multi-row VALUES constructs

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> In HEAD, we could change the RTE data structure so that
>> transformValuesClause could save the typmod information in the RTE,
>> keeping the lookups cheap.

> Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> a bit about it at
> https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql

I dunno.  If your example there is correct that XMLTABLE can be called as
a plain function in a SELECT list, then I doubt that we want to tie
anything about it to the RTE data structure.  If anything, the case where
it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
case.

I've been trying to avoid getting involved in the XMLTABLE patch, mainly
because I know zip about XML, but maybe I need to take a look.
        regards, tom lane



Re: Typmod associated with multi-row VALUES constructs

От
Tom Lane
Дата:
Attached is a patch that fixes this by storing typmod info in the RTE.
This turned out to be straightforward, and I think it's definitely
what we should do in HEAD.  I have mixed emotions about whether it's
worth doing anything about it in the back branches.

I chose to redefine the existing coltypes/coltypmods/colcollations
lists for CTE RTEs as also applying to VALUES RTEs.  That saves a
little space in the RangeTblEntry nodes and allows sharing code
in a couple of places.  It's tempting to consider making that apply
to all RTE types, which would permit collapsing expandRTE() and
get_rte_attribute_type() into a single case.  But AFAICS there would
be no benefit elsewhere, so I'm not sure the extra code churn is
justified.

BTW, I noticed that the CTE case of expandRTE() fails to assign the
specified location to the generated Vars, which is clearly a bug
though a very minor one; it would result in failing to display a
parse error location in some cases where we would do so for Vars from
other RTE types.  That part might be worth back-patching, not sure.

            regards, tom lane

diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e30c57e..d973225 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
*************** _copyRangeTblEntry(const RangeTblEntry *
*** 2149,2161 ****
      COPY_NODE_FIELD(functions);
      COPY_SCALAR_FIELD(funcordinality);
      COPY_NODE_FIELD(values_lists);
-     COPY_NODE_FIELD(values_collations);
      COPY_STRING_FIELD(ctename);
      COPY_SCALAR_FIELD(ctelevelsup);
      COPY_SCALAR_FIELD(self_reference);
!     COPY_NODE_FIELD(ctecoltypes);
!     COPY_NODE_FIELD(ctecoltypmods);
!     COPY_NODE_FIELD(ctecolcollations);
      COPY_NODE_FIELD(alias);
      COPY_NODE_FIELD(eref);
      COPY_SCALAR_FIELD(lateral);
--- 2149,2160 ----
      COPY_NODE_FIELD(functions);
      COPY_SCALAR_FIELD(funcordinality);
      COPY_NODE_FIELD(values_lists);
      COPY_STRING_FIELD(ctename);
      COPY_SCALAR_FIELD(ctelevelsup);
      COPY_SCALAR_FIELD(self_reference);
!     COPY_NODE_FIELD(coltypes);
!     COPY_NODE_FIELD(coltypmods);
!     COPY_NODE_FIELD(colcollations);
      COPY_NODE_FIELD(alias);
      COPY_NODE_FIELD(eref);
      COPY_SCALAR_FIELD(lateral);
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index b7a109c..edc1797 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
*************** _equalRangeTblEntry(const RangeTblEntry
*** 2460,2472 ****
      COMPARE_NODE_FIELD(functions);
      COMPARE_SCALAR_FIELD(funcordinality);
      COMPARE_NODE_FIELD(values_lists);
-     COMPARE_NODE_FIELD(values_collations);
      COMPARE_STRING_FIELD(ctename);
      COMPARE_SCALAR_FIELD(ctelevelsup);
      COMPARE_SCALAR_FIELD(self_reference);
!     COMPARE_NODE_FIELD(ctecoltypes);
!     COMPARE_NODE_FIELD(ctecoltypmods);
!     COMPARE_NODE_FIELD(ctecolcollations);
      COMPARE_NODE_FIELD(alias);
      COMPARE_NODE_FIELD(eref);
      COMPARE_SCALAR_FIELD(lateral);
--- 2460,2471 ----
      COMPARE_NODE_FIELD(functions);
      COMPARE_SCALAR_FIELD(funcordinality);
      COMPARE_NODE_FIELD(values_lists);
      COMPARE_STRING_FIELD(ctename);
      COMPARE_SCALAR_FIELD(ctelevelsup);
      COMPARE_SCALAR_FIELD(self_reference);
!     COMPARE_NODE_FIELD(coltypes);
!     COMPARE_NODE_FIELD(coltypmods);
!     COMPARE_NODE_FIELD(colcollations);
      COMPARE_NODE_FIELD(alias);
      COMPARE_NODE_FIELD(eref);
      COMPARE_SCALAR_FIELD(lateral);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 0d858f5..7258c03 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outRangeTblEntry(StringInfo str, const
*** 2841,2855 ****
              break;
          case RTE_VALUES:
              WRITE_NODE_FIELD(values_lists);
!             WRITE_NODE_FIELD(values_collations);
              break;
          case RTE_CTE:
              WRITE_STRING_FIELD(ctename);
              WRITE_UINT_FIELD(ctelevelsup);
              WRITE_BOOL_FIELD(self_reference);
!             WRITE_NODE_FIELD(ctecoltypes);
!             WRITE_NODE_FIELD(ctecoltypmods);
!             WRITE_NODE_FIELD(ctecolcollations);
              break;
          default:
              elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind);
--- 2841,2857 ----
              break;
          case RTE_VALUES:
              WRITE_NODE_FIELD(values_lists);
!             WRITE_NODE_FIELD(coltypes);
!             WRITE_NODE_FIELD(coltypmods);
!             WRITE_NODE_FIELD(colcollations);
              break;
          case RTE_CTE:
              WRITE_STRING_FIELD(ctename);
              WRITE_UINT_FIELD(ctelevelsup);
              WRITE_BOOL_FIELD(self_reference);
!             WRITE_NODE_FIELD(coltypes);
!             WRITE_NODE_FIELD(coltypmods);
!             WRITE_NODE_FIELD(colcollations);
              break;
          default:
              elog(ERROR, "unrecognized RTE kind: %d", (int) node->rtekind);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c587d4e..d608530 100644
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
*************** _readRangeTblEntry(void)
*** 1314,1328 ****
              break;
          case RTE_VALUES:
              READ_NODE_FIELD(values_lists);
!             READ_NODE_FIELD(values_collations);
              break;
          case RTE_CTE:
              READ_STRING_FIELD(ctename);
              READ_UINT_FIELD(ctelevelsup);
              READ_BOOL_FIELD(self_reference);
!             READ_NODE_FIELD(ctecoltypes);
!             READ_NODE_FIELD(ctecoltypmods);
!             READ_NODE_FIELD(ctecolcollations);
              break;
          default:
              elog(ERROR, "unrecognized RTE kind: %d",
--- 1314,1330 ----
              break;
          case RTE_VALUES:
              READ_NODE_FIELD(values_lists);
!             READ_NODE_FIELD(coltypes);
!             READ_NODE_FIELD(coltypmods);
!             READ_NODE_FIELD(colcollations);
              break;
          case RTE_CTE:
              READ_STRING_FIELD(ctename);
              READ_UINT_FIELD(ctelevelsup);
              READ_BOOL_FIELD(self_reference);
!             READ_NODE_FIELD(coltypes);
!             READ_NODE_FIELD(coltypmods);
!             READ_NODE_FIELD(colcollations);
              break;
          default:
              elog(ERROR, "unrecognized RTE kind: %d",
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d91bc3b..2fe1c8c 100644
*** a/src/backend/optimizer/plan/setrefs.c
--- b/src/backend/optimizer/plan/setrefs.c
*************** add_rte_to_flat_rtable(PlannerGlobal *gl
*** 396,405 ****
      newrte->joinaliasvars = NIL;
      newrte->functions = NIL;
      newrte->values_lists = NIL;
!     newrte->values_collations = NIL;
!     newrte->ctecoltypes = NIL;
!     newrte->ctecoltypmods = NIL;
!     newrte->ctecolcollations = NIL;
      newrte->securityQuals = NIL;

      glob->finalrtable = lappend(glob->finalrtable, newrte);
--- 396,404 ----
      newrte->joinaliasvars = NIL;
      newrte->functions = NIL;
      newrte->values_lists = NIL;
!     newrte->coltypes = NIL;
!     newrte->coltypmods = NIL;
!     newrte->colcollations = NIL;
      newrte->securityQuals = NIL;

      glob->finalrtable = lappend(glob->finalrtable, newrte);
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7364346..5e65fe7 100644
*** a/src/backend/parser/analyze.c
--- b/src/backend/parser/analyze.c
*************** transformInsertStmt(ParseState *pstate,
*** 633,642 ****
           * RTE.
           */
          List       *exprsLists = NIL;
!         List       *collations = NIL;
          int            sublist_length = -1;
          bool        lateral = false;
-         int            i;

          Assert(selectStmt->intoClause == NULL);

--- 633,643 ----
           * RTE.
           */
          List       *exprsLists = NIL;
!         List       *coltypes = NIL;
!         List       *coltypmods = NIL;
!         List       *colcollations = NIL;
          int            sublist_length = -1;
          bool        lateral = false;

          Assert(selectStmt->intoClause == NULL);

*************** transformInsertStmt(ParseState *pstate,
*** 703,713 ****
          }

          /*
!          * Although we don't really need collation info, let's just make sure
!          * we provide a correctly-sized list in the VALUES RTE.
           */
!         for (i = 0; i < sublist_length; i++)
!             collations = lappend_oid(collations, InvalidOid);

          /*
           * Ordinarily there can't be any current-level Vars in the expression
--- 704,723 ----
          }

          /*
!          * Construct column type/typmod/collation lists for the VALUES RTE.
!          * Every expression in each column has been coerced to the type/typmod
!          * of the corresponding target column or subfield, so it's sufficient
!          * to look at the exprType/exprTypmod of the first row.  We don't care
!          * about the collation labeling, so just fill in InvalidOid for that.
           */
!         foreach(lc, (List *) linitial(exprsLists))
!         {
!             Node       *val = (Node *) lfirst(lc);
!
!             coltypes = lappend_oid(coltypes, exprType(val));
!             coltypmods = lappend_int(coltypmods, exprTypmod(val));
!             colcollations = lappend_oid(colcollations, InvalidOid);
!         }

          /*
           * Ordinarily there can't be any current-level Vars in the expression
*************** transformInsertStmt(ParseState *pstate,
*** 722,728 ****
          /*
           * Generate the VALUES RTE
           */
!         rte = addRangeTableEntryForValues(pstate, exprsLists, collations,
                                            NULL, lateral, true);
          rtr = makeNode(RangeTblRef);
          /* assume new rte is at end */
--- 732,739 ----
          /*
           * Generate the VALUES RTE
           */
!         rte = addRangeTableEntryForValues(pstate, exprsLists,
!                                           coltypes, coltypmods, colcollations,
                                            NULL, lateral, true);
          rtr = makeNode(RangeTblRef);
          /* assume new rte is at end */
*************** transformValuesClause(ParseState *pstate
*** 1274,1280 ****
  {
      Query       *qry = makeNode(Query);
      List       *exprsLists;
!     List       *collations;
      List      **colexprs = NULL;
      int            sublist_length = -1;
      bool        lateral = false;
--- 1285,1293 ----
  {
      Query       *qry = makeNode(Query);
      List       *exprsLists;
!     List       *coltypes = NIL;
!     List       *coltypmods = NIL;
!     List       *colcollations = NIL;
      List      **colexprs = NULL;
      int            sublist_length = -1;
      bool        lateral = false;
*************** transformValuesClause(ParseState *pstate
*** 1360,1367 ****

      /*
       * Now resolve the common types of the columns, and coerce everything to
!      * those types.  Then identify the common collation, if any, of each
!      * column.
       *
       * We must do collation processing now because (1) assign_query_collations
       * doesn't process rangetable entries, and (2) we need to label the VALUES
--- 1373,1380 ----

      /*
       * Now resolve the common types of the columns, and coerce everything to
!      * those types.  Then identify the common typmod and common collation, if
!      * any, of each column.
       *
       * We must do collation processing now because (1) assign_query_collations
       * doesn't process rangetable entries, and (2) we need to label the VALUES
*************** transformValuesClause(ParseState *pstate
*** 1372,1382 ****
       *
       * Note we modify the per-column expression lists in-place.
       */
-     collations = NIL;
      for (i = 0; i < sublist_length; i++)
      {
          Oid            coltype;
          Oid            colcoll;

          coltype = select_common_type(pstate, colexprs[i], "VALUES", NULL);

--- 1385,1396 ----
       *
       * Note we modify the per-column expression lists in-place.
       */
      for (i = 0; i < sublist_length; i++)
      {
          Oid            coltype;
+         int32        coltypmod = -1;
          Oid            colcoll;
+         bool        first = true;

          coltype = select_common_type(pstate, colexprs[i], "VALUES", NULL);

*************** transformValuesClause(ParseState *pstate
*** 1386,1396 ****

              col = coerce_to_common_type(pstate, col, coltype, "VALUES");
              lfirst(lc) = (void *) col;
          }

          colcoll = select_common_collation(pstate, colexprs[i], true);

!         collations = lappend_oid(collations, colcoll);
      }

      /*
--- 1400,1423 ----

              col = coerce_to_common_type(pstate, col, coltype, "VALUES");
              lfirst(lc) = (void *) col;
+             if (first)
+             {
+                 coltypmod = exprTypmod(col);
+                 first = false;
+             }
+             else
+             {
+                 /* As soon as we see a non-matching typmod, fall back to -1 */
+                 if (coltypmod >= 0 && coltypmod != exprTypmod(col))
+                     coltypmod = -1;
+             }
          }

          colcoll = select_common_collation(pstate, colexprs[i], true);

!         coltypes = lappend_oid(coltypes, coltype);
!         coltypmods = lappend_int(coltypmods, coltypmod);
!         colcollations = lappend_oid(colcollations, colcoll);
      }

      /*
*************** transformValuesClause(ParseState *pstate
*** 1432,1438 ****
      /*
       * Generate the VALUES RTE
       */
!     rte = addRangeTableEntryForValues(pstate, exprsLists, collations,
                                        NULL, lateral, true);
      addRTEtoQuery(pstate, rte, true, true, true);

--- 1459,1466 ----
      /*
       * Generate the VALUES RTE
       */
!     rte = addRangeTableEntryForValues(pstate, exprsLists,
!                                       coltypes, coltypmods, colcollations,
                                        NULL, lateral, true);
      addRTEtoQuery(pstate, rte, true, true, true);

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 1e3ecbc..58f7050 100644
*** a/src/backend/parser/parse_relation.c
--- b/src/backend/parser/parse_relation.c
*************** addRangeTableEntryForFunction(ParseState
*** 1635,1641 ****
  RangeTblEntry *
  addRangeTableEntryForValues(ParseState *pstate,
                              List *exprs,
!                             List *collations,
                              Alias *alias,
                              bool lateral,
                              bool inFromCl)
--- 1635,1643 ----
  RangeTblEntry *
  addRangeTableEntryForValues(ParseState *pstate,
                              List *exprs,
!                             List *coltypes,
!                             List *coltypmods,
!                             List *colcollations,
                              Alias *alias,
                              bool lateral,
                              bool inFromCl)
*************** addRangeTableEntryForValues(ParseState *
*** 1652,1658 ****
      rte->relid = InvalidOid;
      rte->subquery = NULL;
      rte->values_lists = exprs;
!     rte->values_collations = collations;
      rte->alias = alias;

      eref = alias ? copyObject(alias) : makeAlias(refname, NIL);
--- 1654,1662 ----
      rte->relid = InvalidOid;
      rte->subquery = NULL;
      rte->values_lists = exprs;
!     rte->coltypes = coltypes;
!     rte->coltypmods = coltypmods;
!     rte->colcollations = colcollations;
      rte->alias = alias;

      eref = alias ? copyObject(alias) : makeAlias(refname, NIL);
*************** addRangeTableEntryForCTE(ParseState *pst
*** 1822,1830 ****
                       parser_errposition(pstate, rv->location)));
      }

!     rte->ctecoltypes = cte->ctecoltypes;
!     rte->ctecoltypmods = cte->ctecoltypmods;
!     rte->ctecolcollations = cte->ctecolcollations;

      rte->alias = alias;
      if (alias)
--- 1826,1834 ----
                       parser_errposition(pstate, rv->location)));
      }

!     rte->coltypes = cte->ctecoltypes;
!     rte->coltypmods = cte->ctecoltypmods;
!     rte->colcollations = cte->ctecolcollations;

      rte->alias = alias;
      if (alias)
*************** expandRTE(RangeTblEntry *rte, int rtinde
*** 2153,2198 ****
                  }
              }
              break;
-         case RTE_VALUES:
-             {
-                 /* Values RTE */
-                 ListCell   *aliasp_item = list_head(rte->eref->colnames);
-                 ListCell   *lcv;
-                 ListCell   *lcc;
-
-                 varattno = 0;
-                 forboth(lcv, (List *) linitial(rte->values_lists),
-                         lcc, rte->values_collations)
-                 {
-                     Node       *col = (Node *) lfirst(lcv);
-                     Oid            colcollation = lfirst_oid(lcc);
-
-                     varattno++;
-                     if (colnames)
-                     {
-                         /* Assume there is one alias per column */
-                         char       *label = strVal(lfirst(aliasp_item));
-
-                         *colnames = lappend(*colnames,
-                                             makeString(pstrdup(label)));
-                         aliasp_item = lnext(aliasp_item);
-                     }
-
-                     if (colvars)
-                     {
-                         Var           *varnode;
-
-                         varnode = makeVar(rtindex, varattno,
-                                           exprType(col),
-                                           exprTypmod(col),
-                                           colcollation,
-                                           sublevels_up);
-                         varnode->location = location;
-                         *colvars = lappend(*colvars, varnode);
-                     }
-                 }
-             }
-             break;
          case RTE_JOIN:
              {
                  /* Join RTE */
--- 2157,2162 ----
*************** expandRTE(RangeTblEntry *rte, int rtinde
*** 2262,2278 ****
                  }
              }
              break;
          case RTE_CTE:
              {
                  ListCell   *aliasp_item = list_head(rte->eref->colnames);
                  ListCell   *lct;
                  ListCell   *lcm;
                  ListCell   *lcc;

                  varattno = 0;
!                 forthree(lct, rte->ctecoltypes,
!                          lcm, rte->ctecoltypmods,
!                          lcc, rte->ctecolcollations)
                  {
                      Oid            coltype = lfirst_oid(lct);
                      int32        coltypmod = lfirst_int(lcm);
--- 2226,2244 ----
                  }
              }
              break;
+         case RTE_VALUES:
          case RTE_CTE:
              {
+                 /* Values or CTE RTE */
                  ListCell   *aliasp_item = list_head(rte->eref->colnames);
                  ListCell   *lct;
                  ListCell   *lcm;
                  ListCell   *lcc;

                  varattno = 0;
!                 forthree(lct, rte->coltypes,
!                          lcm, rte->coltypmods,
!                          lcc, rte->colcollations)
                  {
                      Oid            coltype = lfirst_oid(lct);
                      int32        coltypmod = lfirst_int(lcm);
*************** expandRTE(RangeTblEntry *rte, int rtinde
*** 2285,2291 ****
                          /* Assume there is one alias per output column */
                          char       *label = strVal(lfirst(aliasp_item));

!                         *colnames = lappend(*colnames, makeString(pstrdup(label)));
                          aliasp_item = lnext(aliasp_item);
                      }

--- 2251,2258 ----
                          /* Assume there is one alias per output column */
                          char       *label = strVal(lfirst(aliasp_item));

!                         *colnames = lappend(*colnames,
!                                             makeString(pstrdup(label)));
                          aliasp_item = lnext(aliasp_item);
                      }

*************** expandRTE(RangeTblEntry *rte, int rtinde
*** 2296,2301 ****
--- 2263,2270 ----
                          varnode = makeVar(rtindex, varattno,
                                            coltype, coltypmod, colcoll,
                                            sublevels_up);
+                         varnode->location = location;
+
                          *colvars = lappend(*colvars, varnode);
                      }
                  }
*************** get_rte_attribute_type(RangeTblEntry *rt
*** 2654,2675 ****
                                  rte->eref->aliasname)));
              }
              break;
-         case RTE_VALUES:
-             {
-                 /* Values RTE --- get type info from first sublist */
-                 /* collation is stored separately, though */
-                 List       *collist = (List *) linitial(rte->values_lists);
-                 Node       *col;
-
-                 if (attnum < 1 || attnum > list_length(collist))
-                     elog(ERROR, "values list %s does not have attribute %d",
-                          rte->eref->aliasname, attnum);
-                 col = (Node *) list_nth(collist, attnum - 1);
-                 *vartype = exprType(col);
-                 *vartypmod = exprTypmod(col);
-                 *varcollid = list_nth_oid(rte->values_collations, attnum - 1);
-             }
-             break;
          case RTE_JOIN:
              {
                  /*
--- 2623,2628 ----
*************** get_rte_attribute_type(RangeTblEntry *rt
*** 2685,2697 ****
                  *varcollid = exprCollation(aliasvar);
              }
              break;
          case RTE_CTE:
              {
!                 /* CTE RTE --- get type info from lists in the RTE */
!                 Assert(attnum > 0 && attnum <= list_length(rte->ctecoltypes));
!                 *vartype = list_nth_oid(rte->ctecoltypes, attnum - 1);
!                 *vartypmod = list_nth_int(rte->ctecoltypmods, attnum - 1);
!                 *varcollid = list_nth_oid(rte->ctecolcollations, attnum - 1);
              }
              break;
          default:
--- 2638,2651 ----
                  *varcollid = exprCollation(aliasvar);
              }
              break;
+         case RTE_VALUES:
          case RTE_CTE:
              {
!                 /* VALUES or CTE RTE --- get type info from lists in the RTE */
!                 Assert(attnum > 0 && attnum <= list_length(rte->coltypes));
!                 *vartype = list_nth_oid(rte->coltypes, attnum - 1);
!                 *vartypmod = list_nth_int(rte->coltypmods, attnum - 1);
!                 *varcollid = list_nth_oid(rte->colcollations, attnum - 1);
              }
              break;
          default:
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 6b95c48..fc532fb 100644
*** a/src/include/nodes/parsenodes.h
--- b/src/include/nodes/parsenodes.h
*************** typedef struct RangeTblEntry
*** 927,933 ****
       * Fields valid for a values RTE (else NIL):
       */
      List       *values_lists;    /* list of expression lists */
-     List       *values_collations;        /* OID list of column collation OIDs */

      /*
       * Fields valid for a CTE RTE (else NULL/zero):
--- 927,932 ----
*************** typedef struct RangeTblEntry
*** 935,943 ****
      char       *ctename;        /* name of the WITH list item */
      Index        ctelevelsup;    /* number of query levels up */
      bool        self_reference; /* is this a recursive self-reference? */
!     List       *ctecoltypes;    /* OID list of column type OIDs */
!     List       *ctecoltypmods;    /* integer list of column typmods */
!     List       *ctecolcollations;        /* OID list of column collation OIDs */

      /*
       * Fields valid in all RTEs:
--- 934,950 ----
      char       *ctename;        /* name of the WITH list item */
      Index        ctelevelsup;    /* number of query levels up */
      bool        self_reference; /* is this a recursive self-reference? */
!
!     /*
!      * Fields valid for values and CTE RTEs (else NIL):
!      *
!      * We need these for CTE RTEs so that the types of self-referential
!      * columns are well-defined.  For VALUES RTEs, storing these explicitly
!      * saves having to re-determine the info by scanning the values_lists.
!      */
!     List       *coltypes;        /* OID list of column type OIDs */
!     List       *coltypmods;        /* integer list of column typmods */
!     List       *colcollations;    /* OID list of column collation OIDs */

      /*
       * Fields valid in all RTEs:
diff --git a/src/include/parser/parse_relation.h b/src/include/parser/parse_relation.h
index 3ef3d7b..9463f9d 100644
*** a/src/include/parser/parse_relation.h
--- b/src/include/parser/parse_relation.h
*************** extern RangeTblEntry *addRangeTableEntry
*** 85,91 ****
                                bool inFromCl);
  extern RangeTblEntry *addRangeTableEntryForValues(ParseState *pstate,
                              List *exprs,
!                             List *collations,
                              Alias *alias,
                              bool lateral,
                              bool inFromCl);
--- 85,93 ----
                                bool inFromCl);
  extern RangeTblEntry *addRangeTableEntryForValues(ParseState *pstate,
                              List *exprs,
!                             List *coltypes,
!                             List *coltypmods,
!                             List *colcollations,
                              Alias *alias,
                              bool lateral,
                              bool inFromCl);
diff --git a/src/test/regress/expected/create_view.out b/src/test/regress/expected/create_view.out
index 66ed2c8..15ceefe 100644
*** a/src/test/regress/expected/create_view.out
--- b/src/test/regress/expected/create_view.out
*************** SELECT relname, relkind, reloptions FROM
*** 288,293 ****
--- 288,317 ----
   mysecview4 | v       | {security_barrier=false}
  (4 rows)

+ -- This test checks that proper typmods are assigned in a multi-row VALUES
+ CREATE TABLE tt1 AS
+   SELECT * FROM (
+     VALUES
+        ('abc'::varchar(3), '0123456789', 42, 'abcd'::varchar(4)),
+        ('0123456789', 'abc'::varchar(3), 42.12, 'abc'::varchar(4))
+   ) vv(a,b,c,d);
+ \d tt1
+                    Table "testviewschm2.tt1"
+  Column |         Type         | Collation | Nullable | Default
+ --------+----------------------+-----------+----------+---------
+  a      | character varying    |           |          |
+  b      | character varying    |           |          |
+  c      | numeric              |           |          |
+  d      | character varying(4) |           |          |
+
+ SELECT * FROM tt1;
+      a      |     b      |   c   |  d
+ ------------+------------+-------+------
+  abc        | 0123456789 |    42 | abcd
+  0123456789 | abc        | 42.12 | abc
+ (2 rows)
+
+ DROP TABLE tt1;
  -- Test view decompilation in the face of relation renaming conflicts
  CREATE TABLE tt1 (f1 int, f2 int, f3 text);
  CREATE TABLE tx1 (x1 int, x2 int, x3 text);
diff --git a/src/test/regress/sql/create_view.sql b/src/test/regress/sql/create_view.sql
index 8bed5a5..c3391f9 100644
*** a/src/test/regress/sql/create_view.sql
--- b/src/test/regress/sql/create_view.sql
*************** SELECT relname, relkind, reloptions FROM
*** 224,229 ****
--- 224,241 ----
                       'mysecview3'::regclass, 'mysecview4'::regclass)
         ORDER BY relname;

+ -- This test checks that proper typmods are assigned in a multi-row VALUES
+
+ CREATE TABLE tt1 AS
+   SELECT * FROM (
+     VALUES
+        ('abc'::varchar(3), '0123456789', 42, 'abcd'::varchar(4)),
+        ('0123456789', 'abc'::varchar(3), 42.12, 'abc'::varchar(4))
+   ) vv(a,b,c,d);
+ \d tt1
+ SELECT * FROM tt1;
+ DROP TABLE tt1;
+
  -- Test view decompilation in the face of relation renaming conflicts

  CREATE TABLE tt1 (f1 int, f2 int, f3 text);

Re: Typmod associated with multi-row VALUES constructs

От
Pavel Stehule
Дата:


2016-12-07 22:17 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Tom Lane wrote:

> In HEAD, we could change the RTE data structure so that
> transformValuesClause could save the typmod information in the RTE,
> keeping the lookups cheap.

Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
a bit about it at
https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql

The patch has evolved quite a bit since then, but the tupdesc continues
to be a problem.  Latest patch is at
https://www.postgresql.org/message-id/CAFj8pRBsrhwR636-_3TPbqu%3DFo3_DDer6_yp_afzR7qzhW1T6Q%40mail.gmail.com

What do you dislike on tupdesc usage there?

regards

Pavel


--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Typmod associated with multi-row VALUES constructs

От
Kyotaro HORIGUCHI
Дата:
Mmm. I did the same thing in select_common_type and resulted in a
messier (a bit).

At Wed, 07 Dec 2016 23:44:19 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <15128.1481172259@sss.pgh.pa.us>
> Attached is a patch that fixes this by storing typmod info in the RTE.
> This turned out to be straightforward, and I think it's definitely
> what we should do in HEAD.  I have mixed emotions about whether it's
> worth doing anything about it in the back branches.

With it, VALUES works as the same as UNION, as documentation
says.

https://www.postgresql.org/docs/8.2/static/queries-values.html

Past versions have the same documentation so back-patching the
*behavior* seems to me worth doing. Instead of modifying RTE,
re-coercing the first row's value would enough (I'm not sure how
to do that now) for back-patching.

> I chose to redefine the existing coltypes/coltypmods/colcollations
> lists for CTE RTEs as also applying to VALUES RTEs.  That saves a
> little space in the RangeTblEntry nodes and allows sharing code
> in a couple of places.  It's tempting to consider making that apply
> to all RTE types, which would permit collapsing expandRTE() and
> get_rte_attribute_type() into a single case.  But AFAICS there would
> be no benefit elsewhere, so I'm not sure the extra code churn is
> justified.

Agreed.

> BTW, I noticed that the CTE case of expandRTE() fails to assign the
> specified location to the generated Vars, which is clearly a bug
> though a very minor one; it would result in failing to display a
> parse error location in some cases where we would do so for Vars from
> other RTE types.  That part might be worth back-patching, not sure.

If we do back-patching VALUES patch, involving it would
worth, I think.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Typmod associated with multi-row VALUES constructs

От
Alvaro Herrera
Дата:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> In HEAD, we could change the RTE data structure so that
> >> transformValuesClause could save the typmod information in the RTE,
> >> keeping the lookups cheap.
> 
> > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > a bit about it at
> > https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql
> 
> I dunno.  If your example there is correct that XMLTABLE can be called as
> a plain function in a SELECT list, then I doubt that we want to tie
> anything about it to the RTE data structure.  If anything, the case where
> it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> case.

Well, XMLTABLE is specified by the standard to be part of <table primary>,
which it turn is part of <table reference>.  I can't immediately tell
whether it allows XMLTABLE to be called like a regular function.  The
current patch allows it, but maybe that's not right, and it's probably
not that useful anyway.

> I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> because I know zip about XML, but maybe I need to take a look.

I think it'd be productive that you did so.  The XML part of it is
reasonably well isolated, so you could give your opinion on the core
parser / executor parts without looking at the XML part.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Typmod associated with multi-row VALUES constructs

От
Pavel Stehule
Дата:


2016-12-08 14:03 GMT+01:00 Alvaro Herrera <alvherre@2ndquadrant.com>:
Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Tom Lane wrote:
> >> In HEAD, we could change the RTE data structure so that
> >> transformValuesClause could save the typmod information in the RTE,
> >> keeping the lookups cheap.
>
> > Hmm, I think this would be useful for the XMLTABLE patch too.  I talked
> > a bit about it at
> > https://www.postgresql.org/message-id/20161122204730.dgipy6gxi25j4e6a@alvherre.pgsql
>
> I dunno.  If your example there is correct that XMLTABLE can be called as
> a plain function in a SELECT list, then I doubt that we want to tie
> anything about it to the RTE data structure.  If anything, the case where
> it appears in FROM seems to need to be treated as a generic RTE_FUNCTION
> case.

Well, XMLTABLE is specified by the standard to be part of <table primary>,
which it turn is part of <table reference>.  I can't immediately tell
whether it allows XMLTABLE to be called like a regular function.  The
current patch allows it, but maybe that's not right, and it's probably
not that useful anyway.

It looks like function, and we support on both sides, so I implemented both.

Probably, there is only 10 rows more related to this feature. Using this function in target list is not critical feature - now with LATERAL JOIN we can live without it. It is just some few steps forward to our user.

Again - implementation of this feature is probably few lines only.
 

> I've been trying to avoid getting involved in the XMLTABLE patch, mainly
> because I know zip about XML, but maybe I need to take a look.

I think it'd be productive that you did so.  The XML part of it is
reasonably well isolated, so you could give your opinion on the core
parser / executor parts without looking at the XML part.

The critical part has zero relation to XML. All is some game with tupledesc.

Regards

Pavel
 

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Typmod associated with multi-row VALUES constructs

От
Tom Lane
Дата:
I've pushed the previous patch to HEAD.  Attached is a proposed patch
(against 9.6) that we could use for the back branches; it takes the
brute force approach of just computing the correct value on-demand
in the two functions at issue.  The key question of course is whether
this is acceptable from a performance standpoint.  I did a simple test
using a 1000-entry VALUES list:

select count(a) from (
values
  ('0'::varchar(3), '0'::varchar(4)),
  ('1'::varchar(3), '1'::varchar(4)),
  ('2'::varchar(3), '2'::varchar(4)),
  ('3'::varchar(3), '3'::varchar(4)),
  ('4'::varchar(3), '4'::varchar(4)),
  ...
  ('996'::varchar(3), '996'::varchar(4)),
  ('997'::varchar(3), '997'::varchar(4)),
  ('998'::varchar(3), '998'::varchar(4)),
  ('999'::varchar(3), '999'::varchar(4))
) v(a,b);

Since all the rows do have the same typmod, this represents the worst
case where we have to scan all the way to the end to confirm the typmod,
and it has about as little overhead otherwise as I could think of doing.
I ran it like this:

pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression

and could not see any above-the-noise-level difference --- in fact,
it seemed like it was faster *with* the patch, which is obviously
impossible; I blame that on chance realignments of loops vs. cache
line boundaries.

So I think this is an okay candidate for back-patching.  If anyone
wants to do their own performance tests, please do.

            regards, tom lane

diff --git a/src/backend/parser/parse_relation.c b/src/backend/parser/parse_relation.c
index 1e3ecbc..c51fd81 100644
*** a/src/backend/parser/parse_relation.c
--- b/src/backend/parser/parse_relation.c
*************** static void expandTupleDesc(TupleDesc tu
*** 52,57 ****
--- 52,58 ----
                  int rtindex, int sublevels_up,
                  int location, bool include_dropped,
                  List **colnames, List **colvars);
+ static int32 *getValuesTypmods(RangeTblEntry *rte);
  static int    specialAttNum(const char *attname);
  static bool isQueryUsingTempRelation_walker(Node *node, void *context);

*************** expandRTE(RangeTblEntry *rte, int rtinde
*** 2157,2165 ****
--- 2158,2179 ----
              {
                  /* Values RTE */
                  ListCell   *aliasp_item = list_head(rte->eref->colnames);
+                 int32       *coltypmods;
                  ListCell   *lcv;
                  ListCell   *lcc;

+                 /*
+                  * It's okay to extract column types from the expressions in
+                  * the first row, since all rows will have been coerced to the
+                  * same types.  Their typmods might not be the same though, so
+                  * we potentially need to examine all rows to compute those.
+                  * Column collations are pre-computed in values_collations.
+                  */
+                 if (colvars)
+                     coltypmods = getValuesTypmods(rte);
+                 else
+                     coltypmods = NULL;
+
                  varattno = 0;
                  forboth(lcv, (List *) linitial(rte->values_lists),
                          lcc, rte->values_collations)
*************** expandRTE(RangeTblEntry *rte, int rtinde
*** 2184,2196 ****

                          varnode = makeVar(rtindex, varattno,
                                            exprType(col),
!                                           exprTypmod(col),
                                            colcollation,
                                            sublevels_up);
                          varnode->location = location;
                          *colvars = lappend(*colvars, varnode);
                      }
                  }
              }
              break;
          case RTE_JOIN:
--- 2198,2212 ----

                          varnode = makeVar(rtindex, varattno,
                                            exprType(col),
!                                           coltypmods[varattno - 1],
                                            colcollation,
                                            sublevels_up);
                          varnode->location = location;
                          *colvars = lappend(*colvars, varnode);
                      }
                  }
+                 if (coltypmods)
+                     pfree(coltypmods);
              }
              break;
          case RTE_JOIN:
*************** expandRTE(RangeTblEntry *rte, int rtinde
*** 2296,2301 ****
--- 2312,2319 ----
                          varnode = makeVar(rtindex, varattno,
                                            coltype, coltypmod, colcoll,
                                            sublevels_up);
+                         varnode->location = location;
+
                          *colvars = lappend(*colvars, varnode);
                      }
                  }
*************** expandTupleDesc(TupleDesc tupdesc, Alias
*** 2413,2418 ****
--- 2431,2504 ----
  }

  /*
+  * getValuesTypmods -- expandRTE subroutine
+  *
+  * Identify per-column typmods for the given VALUES RTE.  Returns a
+  * palloc'd array.
+  */
+ static int32 *
+ getValuesTypmods(RangeTblEntry *rte)
+ {
+     int32       *coltypmods;
+     List       *firstrow;
+     int            ncolumns,
+                 nvalid,
+                 i;
+     ListCell   *lc;
+
+     Assert(rte->values_lists != NIL);
+     firstrow = (List *) linitial(rte->values_lists);
+     ncolumns = list_length(firstrow);
+     coltypmods = (int32 *) palloc(ncolumns * sizeof(int32));
+     nvalid = 0;
+
+     /* Collect the typmods from the first VALUES row */
+     i = 0;
+     foreach(lc, firstrow)
+     {
+         Node       *col = (Node *) lfirst(lc);
+
+         coltypmods[i] = exprTypmod(col);
+         if (coltypmods[i] >= 0)
+             nvalid++;
+         i++;
+     }
+
+     /*
+      * Scan remaining rows; as soon as we have a non-matching typmod for a
+      * column, reset that typmod to -1.  We can bail out early if all typmods
+      * become -1.
+      */
+     if (nvalid > 0)
+     {
+         for_each_cell(lc, lnext(list_head(rte->values_lists)))
+         {
+             List       *thisrow = (List *) lfirst(lc);
+             ListCell   *lc2;
+
+             Assert(list_length(thisrow) == ncolumns);
+             i = 0;
+             foreach(lc2, thisrow)
+             {
+                 Node       *col = (Node *) lfirst(lc2);
+
+                 if (coltypmods[i] >= 0 && coltypmods[i] != exprTypmod(col))
+                 {
+                     coltypmods[i] = -1;
+                     nvalid--;
+                 }
+                 i++;
+             }
+
+             if (nvalid <= 0)
+                 break;
+         }
+     }
+
+     return coltypmods;
+ }
+
+ /*
   * expandRelAttrs -
   *      Workhorse for "*" expansion: produce a list of targetentries
   *      for the attributes of the RTE
*************** get_rte_attribute_type(RangeTblEntry *rt
*** 2656,2673 ****
              break;
          case RTE_VALUES:
              {
!                 /* Values RTE --- get type info from first sublist */
!                 /* collation is stored separately, though */
                  List       *collist = (List *) linitial(rte->values_lists);
                  Node       *col;

                  if (attnum < 1 || attnum > list_length(collist))
                      elog(ERROR, "values list %s does not have attribute %d",
                           rte->eref->aliasname, attnum);
                  col = (Node *) list_nth(collist, attnum - 1);
                  *vartype = exprType(col);
!                 *vartypmod = exprTypmod(col);
                  *varcollid = list_nth_oid(rte->values_collations, attnum - 1);
              }
              break;
          case RTE_JOIN:
--- 2742,2766 ----
              break;
          case RTE_VALUES:
              {
!                 /*
!                  * Values RTE --- we can get type info from first sublist, but
!                  * typmod may require scanning all sublists, and collation is
!                  * stored separately.  Using getValuesTypmods() is overkill,
!                  * but this path is taken so seldom for VALUES that it's not
!                  * worth writing extra code.
!                  */
                  List       *collist = (List *) linitial(rte->values_lists);
                  Node       *col;
+                 int32       *coltypmods = getValuesTypmods(rte);

                  if (attnum < 1 || attnum > list_length(collist))
                      elog(ERROR, "values list %s does not have attribute %d",
                           rte->eref->aliasname, attnum);
                  col = (Node *) list_nth(collist, attnum - 1);
                  *vartype = exprType(col);
!                 *vartypmod = coltypmods[attnum - 1];
                  *varcollid = list_nth_oid(rte->values_collations, attnum - 1);
+                 pfree(coltypmods);
              }
              break;
          case RTE_JOIN:
diff --git a/src/test/regress/expected/create_view.out b/src/test/regress/expected/create_view.out
index 81b4e8d..b1c3cff 100644
*** a/src/test/regress/expected/create_view.out
--- b/src/test/regress/expected/create_view.out
*************** SELECT relname, relkind, reloptions FROM
*** 288,293 ****
--- 288,330 ----
   mysecview4 | v       | {security_barrier=false}
  (4 rows)

+ -- This test checks that proper typmods are assigned in a multi-row VALUES
+ CREATE VIEW tt1 AS
+   SELECT * FROM (
+     VALUES
+        ('abc'::varchar(3), '0123456789', 42, 'abcd'::varchar(4)),
+        ('0123456789', 'abc'::varchar(3), 42.12, 'abc'::varchar(4))
+   ) vv(a,b,c,d);
+ \d+ tt1
+                       View "testviewschm2.tt1"
+  Column |         Type         | Modifiers | Storage  | Description
+ --------+----------------------+-----------+----------+-------------
+  a      | character varying    |           | extended |
+  b      | character varying    |           | extended |
+  c      | numeric              |           | main     |
+  d      | character varying(4) |           | extended |
+ View definition:
+  SELECT vv.a,
+     vv.b,
+     vv.c,
+     vv.d
+    FROM ( VALUES ('abc'::character varying(3),'0123456789'::character varying,42,'abcd'::character varying(4)),
('0123456789'::charactervarying,'abc'::character varying(3),42.12,'abc'::character varying(4))) vv(a, b, c, d); 
+
+ SELECT * FROM tt1;
+      a      |     b      |   c   |  d
+ ------------+------------+-------+------
+  abc        | 0123456789 |    42 | abcd
+  0123456789 | abc        | 42.12 | abc
+ (2 rows)
+
+ SELECT a::varchar(3) FROM tt1;
+   a
+ -----
+  abc
+  012
+ (2 rows)
+
+ DROP VIEW tt1;
  -- Test view decompilation in the face of relation renaming conflicts
  CREATE TABLE tt1 (f1 int, f2 int, f3 text);
  CREATE TABLE tx1 (x1 int, x2 int, x3 text);
diff --git a/src/test/regress/sql/create_view.sql b/src/test/regress/sql/create_view.sql
index 8bed5a5..5fe8b94 100644
*** a/src/test/regress/sql/create_view.sql
--- b/src/test/regress/sql/create_view.sql
*************** SELECT relname, relkind, reloptions FROM
*** 224,229 ****
--- 224,242 ----
                       'mysecview3'::regclass, 'mysecview4'::regclass)
         ORDER BY relname;

+ -- This test checks that proper typmods are assigned in a multi-row VALUES
+
+ CREATE VIEW tt1 AS
+   SELECT * FROM (
+     VALUES
+        ('abc'::varchar(3), '0123456789', 42, 'abcd'::varchar(4)),
+        ('0123456789', 'abc'::varchar(3), 42.12, 'abc'::varchar(4))
+   ) vv(a,b,c,d);
+ \d+ tt1
+ SELECT * FROM tt1;
+ SELECT a::varchar(3) FROM tt1;
+ DROP VIEW tt1;
+
  -- Test view decompilation in the face of relation renaming conflicts

  CREATE TABLE tt1 (f1 int, f2 int, f3 text);

Re: [HACKERS] Typmod associated with multi-row VALUES constructs

От
Kyotaro HORIGUCHI
Дата:
Hello, it seems to me to work as expected.

At Thu, 08 Dec 2016 15:58:10 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <7083.1481230690@sss.pgh.pa.us>
> I've pushed the previous patch to HEAD.  Attached is a proposed patch
> (against 9.6) that we could use for the back branches; it takes the
> brute force approach of just computing the correct value on-demand
> in the two functions at issue.  The key question of course is whether
> this is acceptable from a performance standpoint.  I did a simple test
> 
> using a 1000-entry VALUES list:
> 
> select count(a) from (
> values
>   ('0'::varchar(3), '0'::varchar(4)),
>   ('1'::varchar(3), '1'::varchar(4)),
>   ('2'::varchar(3), '2'::varchar(4)),
>   ('3'::varchar(3), '3'::varchar(4)),
>   ('4'::varchar(3), '4'::varchar(4)),
>   ...
>   ('996'::varchar(3), '996'::varchar(4)),
>   ('997'::varchar(3), '997'::varchar(4)),
>   ('998'::varchar(3), '998'::varchar(4)),
>   ('999'::varchar(3), '999'::varchar(4))
> ) v(a,b);
> 
> Since all the rows do have the same typmod, this represents the worst
> case where we have to scan all the way to the end to confirm the typmod,
> and it has about as little overhead otherwise as I could think of doing.
> I ran it like this:
> 
> pgbench -U postgres -n -c 1 -T 1000 -f bigvalues.sql regression
> 
> and could not see any above-the-noise-level difference --- in fact,
> it seemed like it was faster *with* the patch, which is obviously
> impossible; I blame that on chance realignments of loops vs. cache
> line boundaries.
> 
> So I think this is an okay candidate for back-patching.  If anyone
> wants to do their own performance tests, please do.

For all the additional computation, I had the same result on
9.6. The patch accelerates the processing around the noise rate..

9.6 without the patch 103.2 tps
9.6 with    the patch 108.3 tps

For the master branch,

master    102.9 tps
0b78106c^ 103.4 tps

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center