Обсуждение: Something seems weird inside tts_virtual_copyslot()

Поиск
Список
Период
Сортировка

Something seems weird inside tts_virtual_copyslot()

От
David Rowley
Дата:
Looking at the Assert inside tts_virtual_copyslot(), it does:

Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts);

So, that seems to indicate that it's ok for the src slot to have fewer
attributes than the destination.  The code then calls
tts_virtual_clear(dstslot), then slot_getallattrs(srcslot); then does
the following loop:

for (int natt = 0; natt < srcdesc->natts; natt++)
{
    dstslot->tts_values[natt] = srcslot->tts_values[natt];
    dstslot->tts_isnull[natt] = srcslot->tts_isnull[natt];
}

Seems ok so far.  If the srcslot has fewer attributes then that'll
leave the extra dstslot array elements untouched.

Where it gets weird is inside tts_virtual_materialize().  In that
function, we materialize *all* of the dstslot attributes, even the
extra ones that were left alone in the for loop shown above.  Why do
we need to materialize all of those attributes? We only need to
materialize up to srcslot->natts.

Per the following code, only up to the srcdesc->natts would be
accessible anyway:

dstslot->tts_nvalid = srcdesc->natts;

Virtual slots don't need any further deforming and
tts_virtual_getsomeattrs() is coded in a way that we'll find out if
anything tries to deform a virtual slot.

I changed the Assert in tts_virtual_copyslot() to check the natts
match in each of the slots and all of the regression tests still pass,
so it seems we have no tests where there's an attribute number
mismatch...

I wondered if there are any other cases that try to handle mismatching
attribute numbers.  On a quick scan of git grep -E
"^\s*Assert\(.*natts.*\);" I don't see any other Asserts that allow
mismatching attribute numbers.

I think if we are going to support copying slots where the source and
destination don't have the same number of attributes then the
following comment should explain what's allowed and what's not
allowed:

/*
* Copy the contents of the source slot into the destination slot's own
* context. Invoked using callback of the destination slot.
*/
void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);

I also tried adding the following to ExecCopySlot() to see if there is
any other slot copying going on with other slot types where the natts
don't match.  All tests pass still.

Assert(srcslot->tts_tupleDescriptor->natts ==
dstslot->tts_tupleDescriptor->natts);

Is the Assert() in tts_virtual_copyslot() wrong?

David



Re: Something seems weird inside tts_virtual_copyslot()

От
Andres Freund
Дата:
Hi,

On 2023-11-01 11:35:50 +1300, David Rowley wrote:
> Looking at the Assert inside tts_virtual_copyslot(), it does:
> 
> Assert(srcdesc->natts <= dstslot->tts_tupleDescriptor->natts);

I think that assert was intended to be the other way round.


> So, that seems to indicate that it's ok for the src slot to have fewer
> attributes than the destination.  The code then calls
> tts_virtual_clear(dstslot), then slot_getallattrs(srcslot); then does
> the following loop:

> for (int natt = 0; natt < srcdesc->natts; natt++)
> {
>     dstslot->tts_values[natt] = srcslot->tts_values[natt];
>     dstslot->tts_isnull[natt] = srcslot->tts_isnull[natt];
> }
> 
> Seems ok so far.
>
> If the srcslot has fewer attributes then that'll leave the extra dstslot
> array elements untouched.

It is not ok even up to just here! Any access to dstslot->tts_{values,isnull}
for an attribute bigger than srcdesc->natts would now be stale, potentially
pointing to another attribute.


> Where it gets weird is inside tts_virtual_materialize().  In that
> function, we materialize *all* of the dstslot attributes, even the
> extra ones that were left alone in the for loop shown above.  Why do
> we need to materialize all of those attributes? We only need to
> materialize up to srcslot->natts.
> 
> Per the following code, only up to the srcdesc->natts would be
> accessible anyway:
> 
> dstslot->tts_nvalid = srcdesc->natts;
> 
> Virtual slots don't need any further deforming and
> tts_virtual_getsomeattrs() is coded in a way that we'll find out if
> anything tries to deform a virtual slot.
> 
> I changed the Assert in tts_virtual_copyslot() to check the natts
> match in each of the slots and all of the regression tests still pass,
> so it seems we have no tests where there's an attribute number
> mismatch...

If we want to prohibit that, I think we ought to assert this in
ExecCopySlot(), rather than just tts_virtual_copyslot.

Even that does survive the test - but I don't think it'd be really wrong to
copy from a slot with more columns into one with fewer. And it seems plausible
that that could happen somewhere, e.g. when copying from a slot in a child
partition with additional columns into a slot from the parent, where the
column types/order otherwise matches, so we don't have to use the attribute
mapping infrastructure.


> I think if we are going to support copying slots where the source and
> destination don't have the same number of attributes then the
> following comment should explain what's allowed and what's not
> allowed:
> 
> /*
> * Copy the contents of the source slot into the destination slot's own
> * context. Invoked using callback of the destination slot.
> */
> void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);

Arguably the more relevant place would be document this in ExecCopySlot(), as
that's what "users" of ExecCopySlot() would presumably would look at.  I dug a
bit in the history, and we used to say

The caller must ensure the slots have compatible tupdescs.

whatever that precisely means.


> Is the Assert() in tts_virtual_copyslot() wrong?

Yes, it's inverted.

Greetings,

Andres Freund



Re: Something seems weird inside tts_virtual_copyslot()

От
David Rowley
Дата:
On Sat, 4 Nov 2023 at 15:15, Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-11-01 11:35:50 +1300, David Rowley wrote:
> > I changed the Assert in tts_virtual_copyslot() to check the natts
> > match in each of the slots and all of the regression tests still pass,
> > so it seems we have no tests where there's an attribute number
> > mismatch...
>
> If we want to prohibit that, I think we ought to assert this in
> ExecCopySlot(), rather than just tts_virtual_copyslot.
>
> Even that does survive the test - but I don't think it'd be really wrong to
> copy from a slot with more columns into one with fewer. And it seems plausible
> that that could happen somewhere, e.g. when copying from a slot in a child
> partition with additional columns into a slot from the parent, where the
> column types/order otherwise matches, so we don't have to use the attribute
> mapping infrastructure.

Do you have any examples of when this could happen?

I played around with partitioned tables and partitions with various
combinations of dropped columns and can't see any cases of this. Given
the assert's condition has been backwards for 5 years now
(4da597edf1b), it just seems a bit unlikely that we have cases where
the source slot can have more attributes than the destination.

Given the Assert has been that way around for this long without any
complaints, I think we should just insist that the natts must match in
each slot.  If we later discover some reason that there's some corner
case where they don't match, we can adjust the code then.

I played around with the attached patch which removes the Assert and
puts some additional Assert checks inside ExecCopySlot() which
additionally checks the attribute types also match.  There are valid
cases where they don't match and that seems to be limited to cases
where we're performing DML on a table with a dropped column.
expand_insert_targetlist() will add NULL::int4 constants to the
targetlist in place of dropped columns but the tupledesc of the table
will have the atttypid set to InvalidOid per what that gets set to
when a column is dropped in RemoveAttributeById().

> > I think if we are going to support copying slots where the source and
> > destination don't have the same number of attributes then the
> > following comment should explain what's allowed and what's not
> > allowed:
> >
> > /*
> > * Copy the contents of the source slot into the destination slot's own
> > * context. Invoked using callback of the destination slot.
> > */
> > void (*copyslot) (TupleTableSlot *dstslot, TupleTableSlot *srcslot);
>
> Arguably the more relevant place would be document this in ExecCopySlot(), as
> that's what "users" of ExecCopySlot() would presumably would look at.  I dug a
> bit in the history, and we used to say

I think it depends on what you're documenting.  Writing comments above
the copyslot API function declaration is useful to define the API
standard for what new implementations of that interface must abide by.
Comments in ExecCopySlot() are useful to users of that function.  It
seems to me that both locations are relevant. New implementations of
copyslot need to know what they must handle, else they're left just to
look at what other implementations do and guess the rest.

David

Вложения

Re: Something seems weird inside tts_virtual_copyslot()

От
Andres Freund
Дата:
Hi,

On 2023-11-06 11:16:26 +1300, David Rowley wrote:
> On Sat, 4 Nov 2023 at 15:15, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2023-11-01 11:35:50 +1300, David Rowley wrote:
> > > I changed the Assert in tts_virtual_copyslot() to check the natts
> > > match in each of the slots and all of the regression tests still pass,
> > > so it seems we have no tests where there's an attribute number
> > > mismatch...
> >
> > If we want to prohibit that, I think we ought to assert this in
> > ExecCopySlot(), rather than just tts_virtual_copyslot.
> >
> > Even that does survive the test - but I don't think it'd be really wrong to
> > copy from a slot with more columns into one with fewer. And it seems plausible
> > that that could happen somewhere, e.g. when copying from a slot in a child
> > partition with additional columns into a slot from the parent, where the
> > column types/order otherwise matches, so we don't have to use the attribute
> > mapping infrastructure.
> 
> Do you have any examples of when this could happen?

> I played around with partitioned tables and partitions with various
> combinations of dropped columns and can't see any cases of this. Given
> the assert's condition has been backwards for 5 years now
> (4da597edf1b), it just seems a bit unlikely that we have cases where
> the source slot can have more attributes than the destination.

I think my concerns might be unfounded - I was worried about stuff like
attribute mapping deciding that it's safe to copy without an attribute
mapping, because all the types match. But it looks like we do check that the
attnums match as well. There's similar code in a bunch of other places,
e.g. ExecEvalWholeRowVar(), but that also verifies ->natts matches.

So I think adding an assert to ExecCopySlot(), perhaps with a comment saying
that the restriction could be lifted with a bit of work, would be fine.

Greetings,

Andres Freund



Re: Something seems weird inside tts_virtual_copyslot()

От
David Rowley
Дата:
On Fri, 1 Dec 2023 at 13:14, Andres Freund <andres@anarazel.de> wrote:
> So I think adding an assert to ExecCopySlot(), perhaps with a comment saying
> that the restriction could be lifted with a bit of work, would be fine.

Thanks for looking at this again.

How about the attached?  I wrote the comment you mentioned and also
removed the Assert from tts_virtual_copyslot().

I also noted in the copyslot callback declaration that implementers
can assume the number of attributes in the source and destination
slots match.

David

Вложения

Re: Something seems weird inside tts_virtual_copyslot()

От
David Rowley
Дата:
On Fri, 1 Dec 2023 at 14:30, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 1 Dec 2023 at 13:14, Andres Freund <andres@anarazel.de> wrote:
> > So I think adding an assert to ExecCopySlot(), perhaps with a comment saying
> > that the restriction could be lifted with a bit of work, would be fine.
>
> How about the attached?  I wrote the comment you mentioned and also
> removed the Assert from tts_virtual_copyslot().

I looked over this again and didn't see any issues, so pushed the patch.

Thanks for helping with this.

David