Обсуждение: BUG #15900: `executor could not find named tuplestore` in triggers with transition table and row locks

Поиск
Список
Период
Сортировка

BUG #15900: `executor could not find named tuplestore` in triggers with transition table and row locks

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15900
Logged by:          Alex Aktsipetrov
Email address:      alex.akts@gmail.com
PostgreSQL version: 12beta2
Operating system:   Ubuntu 16.04
Description:

SELECT FOR UPDATE query that references a transition table in AFTER
INSERT/UPDATE triggers produces an unexpected error. The same query with FOR
UPDATE omitted finishes without any error, which is my expectation for the
original one as well. AFTER DELETE triggers were not tested.

For example, the following query:

    create table testtr (a int, b text);

    create function testtr_trigger() returns trigger language plpgsql as
    $$begin
     perform(
       select array_agg(a) from
       (select testtr.a from testtr join new_table on testtr.a = new_table.a
for update)
       as tmp
     );
     return new;
    end$$;

    create trigger testtr_trigger
    after insert on testtr
    referencing new table as new_table
    for each statement execute procedure testtr_trigger();

    insert into testtr values (1, 'one'), (2, 'two');

produces the following error:

    ERROR:  executor could not find named tuplestore "new_table"
    CONTEXT:  SQL statement "SELECT (
       select array_agg(a) from
       (select testtr.a from testtr join new_table on testtr.a = new_table.a
for update)
       as tmp
    )"

I think the issue was introduced in ad0bda5d24ea2bcc72b5e50020e3c79bab10836b
as the query finishes successfully in its ancestor.


Re: BUG #15900: `executor could not find named tuplestore` intriggers with transition table and row locks

От
Александр Акципетров
Дата:

On Tue, 9 Jul 2019 at 01:22, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      15900
Logged by:          Alex Aktsipetrov
Email address:      alex.akts@gmail.com
PostgreSQL version: 12beta2
Operating system:   Ubuntu 16.04
Description:       

SELECT FOR UPDATE query that references a transition table in AFTER
INSERT/UPDATE triggers produces an unexpected error. The same query with FOR
UPDATE omitted finishes without any error, which is my expectation for the
original one as well. AFTER DELETE triggers were not tested.

For example, the following query:

        create table testtr (a int, b text);

        create function testtr_trigger() returns trigger language plpgsql as
        $$begin
         perform(
           select array_agg(a) from
           (select testtr.a from testtr join new_table on testtr.a = new_table.a
for update)
           as tmp
         );
         return new;
        end$$;

        create trigger testtr_trigger
        after insert on testtr
        referencing new table as new_table
        for each statement execute procedure testtr_trigger();

        insert into testtr values (1, 'one'), (2, 'two');

produces the following error:

        ERROR:  executor could not find named tuplestore "new_table"
        CONTEXT:  SQL statement "SELECT (
           select array_agg(a) from
           (select testtr.a from testtr join new_table on testtr.a = new_table.a
for update)
           as tmp
        )"

I think the issue was introduced in ad0bda5d24ea2bcc72b5e50020e3c79bab10836b
as the query finishes successfully in its ancestor.




Sorry, the previous email was a fat finger error.

Attaching the patch that fixes the issue, although I am not familiar with the codebase to be sure that it is the right idea.
Вложения
On Tue, Jul 9, 2019 at 11:39 AM Alex Aktsipetrov <alex.akts@gmail.com> wrote:
> Attaching the patch that fixes the issue, although I am not familiar with the codebase to be sure that it is the
rightidea.
 

Thanks for the report and the proposed fix!

Hmm, I wonder if EPQ might be involved in bug report #15720 (version 11.2):

https://www.postgresql.org/message-id/flat/15720-38c2b29e5d720187%40postgresql.org

-- 
Thomas Munro
https://enterprisedb.com



On Tue, Jul 9, 2019 at 11:49 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Jul 9, 2019 at 11:39 AM Alex Aktsipetrov <alex.akts@gmail.com> wrote:
> > Attaching the patch that fixes the issue, although I am not familiar with the codebase to be sure that it is the
rightidea.
 
>
> Thanks for the report and the proposed fix!

The repro works for me, and I think the patch is correct.  Here's a
version with that line moved to a more natural place IMHO.

> Hmm, I wonder if EPQ might be involved in bug report #15720 (version 11.2):
>
> https://www.postgresql.org/message-id/flat/15720-38c2b29e5d720187%40postgresql.org

I think it's highly likely that bug #15720 was a case of this bug and
would be fixed by this patch.  Alex's repro doesn't work on 11 though,
because EPQ is not entered at all.  Which raises the question: why do
we need to enter EPQ after commit ad0bda5d on 12/master, for a row
that hasn't been updated by anyone else?

-- 
Thomas Munro
https://enterprisedb.com

Вложения
On Tue, Jul 9, 2019 at 1:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Hmm, I wonder if EPQ might be involved in bug report #15720 (version 11.2):
> >
> > https://www.postgresql.org/message-id/flat/15720-38c2b29e5d720187%40postgresql.org
>
> I think it's highly likely that bug #15720 was a case of this bug and
> would be fixed by this patch.  Alex's repro doesn't work on 11 though,
> because EPQ is not entered at all.  Which raises the question: why do
> we need to enter EPQ after commit ad0bda5d on 12/master, for a row
> that hasn't been updated by anyone else?

Explanation: since ad0bda5d24ea, ExecLockRows() always calls
EvalPlanQualBegin() which initialises the plan state, and in this case
ExecInitNamedTuplestoreScan() errors out due to the bug.  Before, you
needed the right concurrency scenario (epq_needed) before we did that,
as the reporter of bug #15720 discovered.

I'm planning to commit that patch tomorrow.

-- 
Thomas Munro
https://enterprisedb.com



Thomas Munro <thomas.munro@gmail.com> writes:
> On Tue, Jul 9, 2019 at 1:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> I think it's highly likely that bug #15720 was a case of this bug and
>> would be fixed by this patch.

Agreed.  I think your version of the fix is good, and you should
mention #15720 too in the commit message.

>> Alex's repro doesn't work on 11 though,
>> because EPQ is not entered at all.  Which raises the question: why do
>> we need to enter EPQ after commit ad0bda5d on 12/master, for a row
>> that hasn't been updated by anyone else?

> Explanation: since ad0bda5d24ea, ExecLockRows() always calls
> EvalPlanQualBegin() which initialises the plan state, and in this case
> ExecInitNamedTuplestoreScan() errors out due to the bug.  Before, you
> needed the right concurrency scenario (epq_needed) before we did that,
> as the reporter of bug #15720 discovered.

I'm quite desperately unhappy about this observation, because
EvalPlanQualBegin is a *large* amount of overhead that is usually
unnecessary, and is now going to be paid for *every locked row*
whether there's any conflict on it or not.  I do not find that
acceptable.  Why is it necessary to do this before finding that
there's an update conflict?

            regards, tom lane



On Wed, Jul 10, 2019 at 3:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > On Tue, Jul 9, 2019 at 1:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> >> I think it's highly likely that bug #15720 was a case of this bug and
> >> would be fixed by this patch.
>
> Agreed.  I think your version of the fix is good, and you should
> mention #15720 too in the commit message.

Thanks.  Pushed.

> >> Alex's repro doesn't work on 11 though,
> >> because EPQ is not entered at all.  Which raises the question: why do
> >> we need to enter EPQ after commit ad0bda5d on 12/master, for a row
> >> that hasn't been updated by anyone else?
>
> > Explanation: since ad0bda5d24ea, ExecLockRows() always calls
> > EvalPlanQualBegin() which initialises the plan state, and in this case
> > ExecInitNamedTuplestoreScan() errors out due to the bug.  Before, you
> > needed the right concurrency scenario (epq_needed) before we did that,
> > as the reporter of bug #15720 discovered.
>
> I'm quite desperately unhappy about this observation, because
> EvalPlanQualBegin is a *large* amount of overhead that is usually
> unnecessary, and is now going to be paid for *every locked row*
> whether there's any conflict on it or not.  I do not find that
> acceptable.  Why is it necessary to do this before finding that
> there's an update conflict?

I haven't seriously looked into it and haven't succeeded in finding
the discussion of why this is absolutely necessary in the commit's
thread, but you'd think it should be possible to defer slot creation a
bit, or if not, do something cheaper than EPQBegin() that just
initialises the slots.  Andres, Haribabu, Ashutosh?

-- 
Thomas Munro
https://enterprisedb.com



Hi,

On 2019-07-09 11:38:13 -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> >> Alex's repro doesn't work on 11 though,
> >> because EPQ is not entered at all.  Which raises the question: why do
> >> we need to enter EPQ after commit ad0bda5d on 12/master, for a row
> >> that hasn't been updated by anyone else?
> 
> > Explanation: since ad0bda5d24ea, ExecLockRows() always calls
> > EvalPlanQualBegin() which initialises the plan state, and in this case
> > ExecInitNamedTuplestoreScan() errors out due to the bug.  Before, you
> > needed the right concurrency scenario (epq_needed) before we did that,
> > as the reporter of bug #15720 discovered.
> 
> I'm quite desperately unhappy about this observation, because
> EvalPlanQualBegin is a *large* amount of overhead that is usually
> unnecessary, and is now going to be paid for *every locked row*
> whether there's any conflict on it or not.  I do not find that
> acceptable.  Why is it necessary to do this before finding that
> there's an update conflict?

Two main reasons:

Previously we referenced tuples from LockRowsState->lr_curtuples, and
the management of that was pretty tightly interlinked with EPQ, and only
worked for heap tuples.Keeping that scheme would have been somewhat
complicated to continue to maintain.

Secondly, previously the tuple fetched by heap_lock_tuple() was just
stored in a local variable - but now that happens via a slot (as we
otherwise cannot reasonably handle things like heap wanting to return a
pinned buffer, and others not).  As we potentially need to lock rows
from multiple tables, we'd need multiple slots suitable to lock/fetch those
rows.

EPQ already needed similar infrastructure internally - and those tuples
were fetched and retained for EPQ's benefit. So it seemed sensible to
just use the slots from EPQ.  Note that we don't need
EvalPlanQualFetch() anymore, as it's work is done inside the AM -
obviously there's no AM independent way to perform correct ctid chasing.

Which large overhead you mean is going to be paid for "*every locked
row*"? EvalPlanQualBegin() ought to be fairly fast for repeated calls in
the same node - although I do admit that it'd be a lot nicer if the
ExecSetParamPlanMulti() work wouldn't need be redone.

I think it might be reasonable to split EvalPlanQualBegin() (and perhaps
EvalPlanQualStart()) into two pieces: One to just have a minimal
EPQState, with valid slots, and one to get ready to actually run EPQ?

Unfortunately I'm going to be unreachable pretty soon till Monday
(hiking without any network access), so I won't be immediately able to respond.

Greetings,

Andres Freund