Обсуждение: WIP: push AFTER-trigger execution into ModifyTable node

Поиск
Список
Период
Сортировка

WIP: push AFTER-trigger execution into ModifyTable node

От
Tom Lane
Дата:
In http://archives.postgresql.org/message-id/26545.1255140067@sss.pgh.pa.us
I suggested that we should push the actual execution (not just queuing)
of non-deferred AFTER triggers into the new ModifyTable plan node.
The attached patch does that, and seems like a nice improvement since it
removes knowledge of trigger handling from a number of other places.
However the original objective was to allow EXPLAIN to associate trigger
runtimes with ModifyTable nodes, and I realized that this patch doesn't
accomplish that --- the trigger stats are still accumulated in the
executor-wide EState, not in the ModifyTable node.  Right at the moment
we could cheat and have EXPLAIN print the trigger stats under
ModifyTable anyway, because there can be only one ModifyTable in any
plan tree.  But that will fall down as soon as we try to let INSERT
RETURNING and friends execute within WITH clauses.

After poking around a bit I think it should be possible to keep the
trigger instrumentation data in the ModifyTable node instead of in
EState, and thereby allow EXPLAIN to know which node to blame the
trigger time on.  This will require passing an extra parameter down
from nodeModifyTable into the trigger code, but none of those call paths
are very long.  Still, it'll be a significantly more invasive patch by
the time it's done than what you see here.

So, before I go off and do that work: anybody have an objection to this
line of development?  The main implication of changing to this approach
is that we'll be nailing down the assumption that each WITH (command
RETURNING) clause acts very much like a separate statement for
trigger purposes: it will fire BEFORE STATEMENT triggers at start,
and AFTER STATEMENT triggers at end, and actually execute all
non-deferred AFTER triggers, before we move on to executing the next
WITH clause or the main query.

            regards, tom lane

Index: src/backend/commands/explain.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/explain.c,v
retrieving revision 1.192
diff -c -r1.192 explain.c
*** src/backend/commands/explain.c    12 Oct 2009 18:10:41 -0000    1.192
--- src/backend/commands/explain.c    28 Oct 2009 22:24:42 -0000
***************
*** 19,25 ****
  #include "commands/defrem.h"
  #include "commands/explain.h"
  #include "commands/prepare.h"
- #include "commands/trigger.h"
  #include "executor/instrument.h"
  #include "optimizer/clauses.h"
  #include "optimizer/planner.h"
--- 19,24 ----
***************
*** 354,363 ****

      INSTR_TIME_SET_CURRENT(starttime);

-     /* If analyzing, we need to cope with queued triggers */
-     if (es->analyze)
-         AfterTriggerBeginQuery();
-
      /* Select execution options */
      if (es->analyze)
          eflags = 0;                /* default run-to-completion flags */
--- 353,358 ----
***************
*** 383,402 ****
      ExplainPrintPlan(es, queryDesc);

      /*
!      * If we ran the command, run any AFTER triggers it queued.  (Note this
!      * will not include DEFERRED triggers; since those don't run until end of
!      * transaction, we can't measure them.)  Include into total runtime.
       */
      if (es->analyze)
      {
-         INSTR_TIME_SET_CURRENT(starttime);
-         AfterTriggerEndQuery(queryDesc->estate);
-         totaltime += elapsed_time(&starttime);
-     }
-
-     /* Print info about runtime of triggers */
-     if (es->analyze)
-     {
          ResultRelInfo *rInfo;
          bool        show_relname;
          int            numrels = queryDesc->estate->es_num_result_relations;
--- 378,389 ----
      ExplainPrintPlan(es, queryDesc);

      /*
!      * Print info about runtime of triggers.  (Note this will not include
!      * DEFERRED triggers; since those don't run until end of transaction, we
!      * can't measure them.)
       */
      if (es->analyze)
      {
          ResultRelInfo *rInfo;
          bool        show_relname;
          int            numrels = queryDesc->estate->es_num_result_relations;
Index: src/backend/commands/portalcmds.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/portalcmds.c,v
retrieving revision 1.81
diff -c -r1.81 portalcmds.c
*** src/backend/commands/portalcmds.c    7 Oct 2009 16:27:18 -0000    1.81
--- src/backend/commands/portalcmds.c    28 Oct 2009 22:24:42 -0000
***************
*** 264,270 ****
              PG_TRY();
              {
                  CurrentResourceOwner = portal->resowner;
-                 /* we do not need AfterTriggerEndQuery() here */
                  ExecutorEnd(queryDesc);
                  FreeQueryDesc(queryDesc);
              }
--- 264,269 ----
***************
*** 371,377 ****
           * Now shut down the inner executor.
           */
          portal->queryDesc = NULL;        /* prevent double shutdown */
-         /* we do not need AfterTriggerEndQuery() here */
          ExecutorEnd(queryDesc);
          FreeQueryDesc(queryDesc);

--- 370,375 ----
Index: src/backend/commands/trigger.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/trigger.c,v
retrieving revision 1.256
diff -c -r1.256 trigger.c
*** src/backend/commands/trigger.c    27 Oct 2009 20:14:27 -0000    1.256
--- src/backend/commands/trigger.c    28 Oct 2009 22:24:42 -0000
***************
*** 3145,3151 ****
   *    we invoke all AFTER IMMEDIATE trigger events queued by the query, and
   *    transfer deferred trigger events to the global deferred-trigger list.
   *
!  *    Note that this should be called just BEFORE closing down the executor
   *    with ExecutorEnd, because we make use of the EState's info about
   *    target relations.
   * ----------
--- 3145,3151 ----
   *    we invoke all AFTER IMMEDIATE trigger events queued by the query, and
   *    transfer deferred trigger events to the global deferred-trigger list.
   *
!  *    Note that this must be called BEFORE closing down the executor
   *    with ExecutorEnd, because we make use of the EState's info about
   *    target relations.
   * ----------
Index: src/backend/executor/execMain.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/execMain.c,v
retrieving revision 1.334
diff -c -r1.334 execMain.c
*** src/backend/executor/execMain.c    26 Oct 2009 02:26:29 -0000    1.334
--- src/backend/executor/execMain.c    28 Oct 2009 22:24:42 -0000
***************
*** 1886,1891 ****
--- 1886,1892 ----
      estate->es_result_relation_info = parentestate->es_result_relation_info;
      /* es_trig_target_relations must NOT be copied */
      estate->es_rowMarks = parentestate->es_rowMarks;
+     estate->es_after_triggers = false;    /* shouldn't be any anyway */
      estate->es_instrument = parentestate->es_instrument;
      estate->es_select_into = parentestate->es_select_into;
      estate->es_into_oids = parentestate->es_into_oids;
Index: src/backend/executor/execUtils.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/execUtils.c,v
retrieving revision 1.165
diff -c -r1.165 execUtils.c
*** src/backend/executor/execUtils.c    26 Oct 2009 02:26:29 -0000    1.165
--- src/backend/executor/execUtils.c    28 Oct 2009 22:24:43 -0000
***************
*** 130,135 ****
--- 130,136 ----
      estate->es_processed = 0;
      estate->es_lastoid = InvalidOid;

+     estate->es_after_triggers = true;    /* fire AFTER triggers by default */
      estate->es_instrument = false;
      estate->es_select_into = false;
      estate->es_into_oids = false;
Index: src/backend/executor/functions.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/functions.c,v
retrieving revision 1.135
diff -c -r1.135 functions.c
*** src/backend/executor/functions.c    11 Jun 2009 17:25:38 -0000    1.135
--- src/backend/executor/functions.c    28 Oct 2009 22:24:43 -0000
***************
*** 17,23 ****
  #include "access/xact.h"
  #include "catalog/pg_proc.h"
  #include "catalog/pg_type.h"
- #include "commands/trigger.h"
  #include "executor/functions.h"
  #include "funcapi.h"
  #include "miscadmin.h"
--- 17,22 ----
***************
*** 426,442 ****

      /* Utility commands don't need Executor. */
      if (es->qd->utilitystmt == NULL)
-     {
-         /*
-          * Only set up to collect queued triggers if it's not a SELECT. This
-          * isn't just an optimization, but is necessary in case a SELECT
-          * returns multiple rows to caller --- we mustn't exit from the
-          * function execution with a stacked AfterTrigger level still active.
-          */
-         if (es->qd->operation != CMD_SELECT)
-             AfterTriggerBeginQuery();
          ExecutorStart(es->qd, 0);
-     }

      es->status = F_EXEC_RUN;
  }
--- 425,431 ----
***************
*** 492,508 ****

      /* Utility commands don't need Executor. */
      if (es->qd->utilitystmt == NULL)
-     {
-         /* Make our snapshot the active one for any called functions */
-         PushActiveSnapshot(es->qd->snapshot);
-
-         if (es->qd->operation != CMD_SELECT)
-             AfterTriggerEndQuery(es->qd->estate);
          ExecutorEnd(es->qd);

-         PopActiveSnapshot();
-     }
-
      (*es->qd->dest->rDestroy) (es->qd->dest);

      FreeQueryDesc(es->qd);
--- 481,488 ----
Index: src/backend/executor/nodeModifyTable.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/nodeModifyTable.c,v
retrieving revision 1.2
diff -c -r1.2 nodeModifyTable.c
*** src/backend/executor/nodeModifyTable.c    26 Oct 2009 02:26:31 -0000    1.2
--- src/backend/executor/nodeModifyTable.c    28 Oct 2009 22:24:43 -0000
***************
*** 578,632 ****


  /*
!  * Process BEFORE EACH STATEMENT triggers
   */
  static void
! fireBSTriggers(ModifyTableState *node)
  {
      switch (node->operation)
      {
          case CMD_INSERT:
!             ExecBSInsertTriggers(node->ps.state,
!                                  node->ps.state->es_result_relations);
              break;
          case CMD_UPDATE:
!             ExecBSUpdateTriggers(node->ps.state,
!                                  node->ps.state->es_result_relations);
              break;
          case CMD_DELETE:
!             ExecBSDeleteTriggers(node->ps.state,
!                                  node->ps.state->es_result_relations);
              break;
          default:
              elog(ERROR, "unknown operation");
              break;
      }
  }

  /*
!  * Process AFTER EACH STATEMENT triggers
   */
  static void
! fireASTriggers(ModifyTableState *node)
  {
      switch (node->operation)
      {
          case CMD_INSERT:
!             ExecASInsertTriggers(node->ps.state,
!                                  node->ps.state->es_result_relations);
              break;
          case CMD_UPDATE:
!             ExecASUpdateTriggers(node->ps.state,
!                                  node->ps.state->es_result_relations);
              break;
          case CMD_DELETE:
!             ExecASDeleteTriggers(node->ps.state,
!                                  node->ps.state->es_result_relations);
              break;
          default:
              elog(ERROR, "unknown operation");
              break;
      }
  }


--- 578,653 ----


  /*
!  * Initialize trigger processing during first execution of node
   */
  static void
! begin_statement_triggers(ModifyTableState *node, EState *estate)
  {
+     /*
+      * Normally, we start a "query" to collect AFTER trigger events.
+      * However, if caller of Executor has cleared es_after_triggers, any
+      * AFTER triggers will be queued in the outer query instead (and
+      * there had better be one!).
+      */
+     if (estate->es_after_triggers)
+         AfterTriggerBeginQuery();
+
+     /*
+      * Process BEFORE EACH STATEMENT triggers
+      */
      switch (node->operation)
      {
          case CMD_INSERT:
!             ExecBSInsertTriggers(estate, estate->es_result_relations);
              break;
          case CMD_UPDATE:
!             ExecBSUpdateTriggers(estate, estate->es_result_relations);
              break;
          case CMD_DELETE:
!             ExecBSDeleteTriggers(estate, estate->es_result_relations);
              break;
          default:
              elog(ERROR, "unknown operation");
              break;
      }
+
+     node->triggers_ready = true;
  }

  /*
!  * Shut down trigger processing after final execution of node
   */
  static void
! end_statement_triggers(ModifyTableState *node, EState *estate)
  {
+     /*
+      * Process AFTER EACH STATEMENT triggers
+      */
      switch (node->operation)
      {
          case CMD_INSERT:
!             ExecASInsertTriggers(estate, estate->es_result_relations);
              break;
          case CMD_UPDATE:
!             ExecASUpdateTriggers(estate, estate->es_result_relations);
              break;
          case CMD_DELETE:
!             ExecASDeleteTriggers(estate, estate->es_result_relations);
              break;
          default:
              elog(ERROR, "unknown operation");
              break;
      }
+
+     /*
+      * Execute any queued AFTER triggers, or move them to the deferred
+      * list if they're deferred.  If not es_after_triggers, they stay
+      * in the outer query's queue.
+      */
+     if (estate->es_after_triggers)
+         AfterTriggerEndQuery(estate);
+
+     node->triggers_ready = false;
  }


***************
*** 650,662 ****
      ItemPointerData tuple_ctid;

      /*
!      * On first call, fire BEFORE STATEMENT triggers before proceeding.
       */
!     if (node->fireBSTriggers)
!     {
!         fireBSTriggers(node);
!         node->fireBSTriggers = false;
!     }

      /*
       * es_result_relation_info must point to the currently active result
--- 671,680 ----
      ItemPointerData tuple_ctid;

      /*
!      * On first call, initialize trigger processing before proceeding.
       */
!     if (!node->triggers_ready)
!         begin_statement_triggers(node, estate);

      /*
       * es_result_relation_info must point to the currently active result
***************
*** 758,766 ****
      estate->es_result_relation_info = NULL;

      /*
!      * We're done, but fire AFTER STATEMENT triggers before exiting.
       */
!     fireASTriggers(node);

      return NULL;
  }
--- 776,784 ----
      estate->es_result_relation_info = NULL;

      /*
!      * We're done, but finish trigger processing before exiting.
       */
!     end_statement_triggers(node, estate);

      return NULL;
  }
***************
*** 805,811 ****
      mtstate->operation = operation;
      /* set up epqstate with dummy subplan pointer for the moment */
      EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, node->epqParam);
!     mtstate->fireBSTriggers = true;

      /* For the moment, assume our targets are exactly the global result rels */

--- 823,829 ----
      mtstate->operation = operation;
      /* set up epqstate with dummy subplan pointer for the moment */
      EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, node->epqParam);
!     mtstate->triggers_ready = false;

      /* For the moment, assume our targets are exactly the global result rels */

Index: src/backend/executor/spi.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/executor/spi.c,v
retrieving revision 1.210
diff -c -r1.210 spi.c
*** src/backend/executor/spi.c    10 Oct 2009 01:43:47 -0000    1.210
--- src/backend/executor/spi.c    28 Oct 2009 22:24:43 -0000
***************
*** 19,25 ****
  #include "access/xact.h"
  #include "catalog/heap.h"
  #include "catalog/pg_type.h"
- #include "commands/trigger.h"
  #include "executor/executor.h"
  #include "executor/spi_priv.h"
  #include "tcop/pquery.h"
--- 19,24 ----
***************
*** 2001,2011 ****
          ResetUsage();
  #endif

-     if (fire_triggers)
-         AfterTriggerBeginQuery();
-
      ExecutorStart(queryDesc, 0);

      ExecutorRun(queryDesc, ForwardScanDirection, tcount);

      _SPI_current->processed = queryDesc->estate->es_processed;
--- 2000,2010 ----
          ResetUsage();
  #endif

      ExecutorStart(queryDesc, 0);

+     /* tell any ModifyTable nodes whether to handle AFTER triggers */
+     queryDesc->estate->es_after_triggers = fire_triggers;
+
      ExecutorRun(queryDesc, ForwardScanDirection, tcount);

      _SPI_current->processed = queryDesc->estate->es_processed;
***************
*** 2018,2027 ****
              elog(ERROR, "consistency check on SPI tuple count failed");
      }

-     /* Take care of any queued AFTER triggers */
-     if (fire_triggers)
-         AfterTriggerEndQuery(queryDesc->estate);
-
      ExecutorEnd(queryDesc);
      /* FreeQueryDesc is done by the caller */

--- 2017,2022 ----
Index: src/backend/tcop/pquery.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tcop/pquery.c,v
retrieving revision 1.132
diff -c -r1.132 pquery.c
*** src/backend/tcop/pquery.c    10 Oct 2009 01:43:49 -0000    1.132
--- src/backend/tcop/pquery.c    28 Oct 2009 22:24:43 -0000
***************
*** 181,191 ****
                                  dest, params, false);

      /*
-      * Set up to collect AFTER triggers
-      */
-     AfterTriggerBeginQuery();
-
-     /*
       * Call ExecutorStart to prepare the plan for execution
       */
      ExecutorStart(queryDesc, 0);
--- 181,186 ----
***************
*** 229,237 ****
          }
      }

-     /* Now take care of any queued AFTER triggers */
-     AfterTriggerEndQuery(queryDesc->estate);
-
      PopActiveSnapshot();

      /*
--- 224,229 ----
***************
*** 518,530 ****
                                              false);

                  /*
-                  * We do *not* call AfterTriggerBeginQuery() here.    We assume
-                  * that a SELECT cannot queue any triggers.  It would be messy
-                  * to support triggers since the execution of the portal may
-                  * be interleaved with other queries.
-                  */
-
-                 /*
                   * If it's a scrollable cursor, executor needs to support
                   * REWIND and backwards scan.
                   */
--- 510,515 ----
Index: src/include/nodes/execnodes.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/nodes/execnodes.h,v
retrieving revision 1.211
diff -c -r1.211 execnodes.h
*** src/include/nodes/execnodes.h    26 Oct 2009 02:26:41 -0000    1.211
--- src/include/nodes/execnodes.h    28 Oct 2009 22:24:43 -0000
***************
*** 361,366 ****
--- 361,367 ----
      uint32        es_processed;    /* # of tuples processed */
      Oid            es_lastoid;        /* last oid processed (by INSERT) */

+     bool        es_after_triggers;    /* true to execute AFTER triggers */
      bool        es_instrument;    /* true requests runtime instrumentation */
      bool        es_select_into; /* true if doing SELECT INTO */
      bool        es_into_oids;    /* true to generate OIDs in SELECT INTO */
***************
*** 1022,1028 ****
      int                mt_nplans;        /* number of plans in the array */
      int                mt_whichplan;    /* which one is being executed (0..n-1) */
      EPQState        mt_epqstate;    /* for evaluating EvalPlanQual rechecks */
!     bool            fireBSTriggers;    /* do we need to fire stmt triggers? */
  } ModifyTableState;

  /* ----------------
--- 1023,1029 ----
      int                mt_nplans;        /* number of plans in the array */
      int                mt_whichplan;    /* which one is being executed (0..n-1) */
      EPQState        mt_epqstate;    /* for evaluating EvalPlanQual rechecks */
!     bool            triggers_ready;    /* are statement-level triggers done? */
  } ModifyTableState;

  /* ----------------

Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Marko Tiikkaja
Дата:
Tom Lane wrote:
> So, before I go off and do that work: anybody have an objection to this
> line of development?  The main implication of changing to this approach
> is that we'll be nailing down the assumption that each WITH (command
> RETURNING) clause acts very much like a separate statement for
> trigger purposes: it will fire BEFORE STATEMENT triggers at start,
> and AFTER STATEMENT triggers at end, and actually execute all
> non-deferred AFTER triggers, before we move on to executing the next
> WITH clause or the main query.

Like we've discussed before, WITH (.. RETURNING ..) is probably most
useful for moving rows from one table to another.  When you're moving a
lot of rows around, there's some point where I believe this execution
strategy will be a lot slower than the traditional approach due to
storing the RETURNING results on disk.  I've been thinking that in some
cases we could inline the CTE for this to actually be a quite
significant performance benefit, so I'm not too fancy about the approach
you're suggesting.


Regards,
Marko Tiikkaja



Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Tom Lane
Дата:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> Like we've discussed before, WITH (.. RETURNING ..) is probably most
> useful for moving rows from one table to another.  When you're moving a
> lot of rows around, there's some point where I believe this execution
> strategy will be a lot slower than the traditional approach due to
> storing the RETURNING results on disk.  I've been thinking that in some
> cases we could inline the CTE for this to actually be a quite
> significant performance benefit, so I'm not too fancy about the approach
> you're suggesting.

Well, this is what we need to nail down *now*.  Are we going to say that
use of WITH(RETURNING) means you forfeit all guarantees about order of
trigger firing?  Short of that, I don't believe that it is sane to think
about pipelining such things.  And if we do do that, it sounds like a
security hole to me, because the owner of the trigger isn't the one who
agreed to forfeit predictability.
        regards, tom lane


Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Robert Haas
Дата:
On Wed, Oct 28, 2009 at 8:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
>> Like we've discussed before, WITH (.. RETURNING ..) is probably most
>> useful for moving rows from one table to another.  When you're moving a
>> lot of rows around, there's some point where I believe this execution
>> strategy will be a lot slower than the traditional approach due to
>> storing the RETURNING results on disk.  I've been thinking that in some
>> cases we could inline the CTE for this to actually be a quite
>> significant performance benefit, so I'm not too fancy about the approach
>> you're suggesting.
>
> Well, this is what we need to nail down *now*.  Are we going to say that
> use of WITH(RETURNING) means you forfeit all guarantees about order of
> trigger firing?  Short of that, I don't believe that it is sane to think
> about pipelining such things.  And if we do do that, it sounds like a
> security hole to me, because the owner of the trigger isn't the one who
> agreed to forfeit predictability.

I don't see why either behavior would be a security hole; we get to
define how the system behaves, and users have to write their triggers
to cope with that behavior.  We don't want to throw random roadbocks
in the way of sanity, but users are not entitled to assume that no
future major release of PG will have semantics that are in any way
different from whichever release they're now running, especially for
features that don't even exist in the current release.  If you have a
specific concern here, maybe you could provide an example.

To be honest, I'm not entirely comfortable with either behavior.
Pipelining a delete out of one table into an insert into another table
seems VERY useful to me, and I'd like us to have a way to do that.  On
the other hand, in more complex cases, the fact that the effects of a
statement are normally not visible to that statement could lead to
some fairly confusing behavior, especially when triggers are involved.So I don't really know what the right thing is.
WhatI really want 
is to provide both behaviors, but I'm not sure there's any sensible
way to do that, and even if there were it's not clear to me that users
will know which one they want.

...Robert


Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> To be honest, I'm not entirely comfortable with either behavior.
> Pipelining a delete out of one table into an insert into another table
> seems VERY useful to me, and I'd like us to have a way to do that.  On
> the other hand, in more complex cases, the fact that the effects of a
> statement are normally not visible to that statement could lead to
> some fairly confusing behavior, especially when triggers are involved.

Yup, that's right.  The trigger timing is really the least of the issues.
If you consider that the various WITH clauses run concurrently with each
other and the main statement, rather than sequentially, then

(1) you have implementation-dependent, and not very desirable, behaviors
if any of the statements modify the same table;

(2) none of the statements will see each others' effects, which is
certainly going to confuse some people;

(3) I'm afraid that there's no good way to ensure that the modifying
statements run to completion if the top-level SELECT stops short of
reading that CTE to its end.

Pipelined execution would be nice but I really doubt that it's worth
what we'd have to give up to have it.  The one-at-a-time behavior will
be simple to understand and reliable to use.  Concurrent execution won't
be either.
        regards, tom lane


Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Greg Stark
Дата:
On Thu, Oct 29, 2009 at 7:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pipelined execution would be nice but I really doubt that it's worth
> what we'd have to give up to have it.  The one-at-a-time behavior will
> be simple to understand and reliable to use.  Concurrent execution won't
> be either.

I think the ideal way to get pipelined execution will be to detect the
cases where it's safe, ie, no deletes, inserts, or updates, no
recursive calls, and only one call site, and inline the sql directly
into the callsite. Actually we could do that even if there are two
callsites if there are no volatile functions but estimating whether
that would be a win or a loss would be hard.

--
greg


Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Marko Tiikkaja
Дата:
Greg Stark wrote:
> On Thu, Oct 29, 2009 at 7:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Pipelined execution would be nice but I really doubt that it's worth
>> what we'd have to give up to have it.  The one-at-a-time behavior will
>> be simple to understand and reliable to use.  Concurrent execution won't
>> be either.
> 
> I think the ideal way to get pipelined execution will be to detect the
> cases where it's safe, ie, no deletes, inserts, or updates, no
> recursive calls, and only one call site, and inline the sql directly
> into the callsite. Actually we could do that even if there are two
> callsites if there are no volatile functions but estimating whether
> that would be a win or a loss would be hard.

What I've had in mind is pipelining the execution only when it doesn't
have *any* impact on the outcome.  This would mean only allowing it when
the top-level statement is either a SELECT or an INSERT.  Also, UPDATEs
and DELETEs inside CTEs can't have the same result relations.  Whether
or not we want to break the expected(?) behaviour for statement-level
triggers, I have no opinion to way or another.

Regards,
Marko Tiikkaja



Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Robert Haas
Дата:
On Sat, Oct 31, 2009 at 5:00 PM, Marko Tiikkaja
<marko.tiikkaja@cs.helsinki.fi> wrote:
> Greg Stark wrote:
>>
>> On Thu, Oct 29, 2009 at 7:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> Pipelined execution would be nice but I really doubt that it's worth
>>> what we'd have to give up to have it.  The one-at-a-time behavior will
>>> be simple to understand and reliable to use.  Concurrent execution won't
>>> be either.
>>
>> I think the ideal way to get pipelined execution will be to detect the
>> cases where it's safe, ie, no deletes, inserts, or updates, no
>> recursive calls, and only one call site, and inline the sql directly
>> into the callsite. Actually we could do that even if there are two
>> callsites if there are no volatile functions but estimating whether
>> that would be a win or a loss would be hard.
>
> What I've had in mind is pipelining the execution only when it doesn't
> have *any* impact on the outcome.  This would mean only allowing it when
> the top-level statement is either a SELECT or an INSERT.  Also, UPDATEs
> and DELETEs inside CTEs can't have the same result relations.  Whether
> or not we want to break the expected(?) behaviour for statement-level
> triggers, I have no opinion to way or another.

You'd also have to disallow the case when there are any triggers on
the INSERT, or where there are any triggers on anything else (because
they might access the target table of the INSERT).  This will end up
being so restricted as to be useless.

...Robert


Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Oct 31, 2009 at 5:00 PM, Marko Tiikkaja
> <marko.tiikkaja@cs.helsinki.fi> wrote:
>> What I've had in mind is pipelining the execution only when it doesn't
>> have *any* impact on the outcome. �This would mean only allowing it when
>> the top-level statement is either a SELECT or an INSERT. �Also, UPDATEs
>> and DELETEs inside CTEs can't have the same result relations. �Whether
>> or not we want to break the expected(?) behaviour for statement-level
>> triggers, I have no opinion to way or another.

> You'd also have to disallow the case when there are any triggers on
> the INSERT, or where there are any triggers on anything else (because
> they might access the target table of the INSERT).  This will end up
> being so restricted as to be useless.

Well, it's already the case that BEFORE triggers shouldn't count on
seeing any particular subset of a statement's results completed.
We could define AFTER triggers as all being fired after the entire
statement is complete (which is not the direction my patch was headed
in, but I have no allegiance to that).  So I think we could define our
way out of the trigger timing issue, and I don't see any big objection
to limiting pipelining to cases where the sub-statements' target tables
don't overlap.

However, this still doesn't address the problem of what happens when the
top-level select fails to read all of the CTE output (because it has a
LIMIT, or the client doesn't read all the output of a portal, etc etc).
Partially executing an update in such cases is no good.
        regards, tom lane


Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Marko Tiikkaja
Дата:
Robert Haas wrote:
> You'd also have to disallow the case when there are any triggers on
> the INSERT, or where there are any triggers on anything else (because
> they might access the target table of the INSERT).  This will end up
> being so restricted as to be useless.

I might be wrong here, but I don't think it matters because those
trigger calls would have a different snapshot, right?  Or am I missing
something?


Regards,
Marko Tiikkaja



Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Marko Tiikkaja
Дата:
Tom Lane wrote:
> However, this still doesn't address the problem of what happens when the
> top-level select fails to read all of the CTE output (because it has a
> LIMIT, or the client doesn't read all the output of a portal, etc etc).
> Partially executing an update in such cases is no good.

I've previously thought about making the CTE aware of the LIMIT,
similarly to a top-N sort, but I don't think it's worth it.  If we have
a LIMIT, we could just fall back to the statement-at-the-time execution.
I'm not sure what all cases you mean with "the client doesn't read all
the output of a portal", but at least for cursors we'd have to first
execute ModifyTable nodes.


Regards,
Marko Tiikkaja




Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Tom Lane
Дата:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> I've previously thought about making the CTE aware of the LIMIT,
> similarly to a top-N sort, but I don't think it's worth it.

That approach doesn't lead to a solution because then you could *never*
optimize it.  The protocol-level limit option is always present.

It's conceivable that we could have ExecutorEnd forcibly run any DML
CTEs to the end (and fire their triggers?) before shutting down the
executor, but it seems like a kluge.
        regards, tom lane


Re: WIP: push AFTER-trigger execution into ModifyTable node

От
Robert Haas
Дата:
On Nov 1, 2009, at 10:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sat, Oct 31, 2009 at 5:00 PM, Marko Tiikkaja
>> <marko.tiikkaja@cs.helsinki.fi> wrote:
>>> What I've had in mind is pipelining the execution only when it  
>>> doesn't
>>> have *any* impact on the outcome.  This would mean only allowing  
>>> it when
>>> the top-level statement is either a SELECT or an INSERT.  Also,  
>>> UPDATEs
>>> and DELETEs inside CTEs can't have the same result relations.   
>>> Whether
>>> or not we want to break the expected(?) behaviour for statement- 
>>> level
>>> triggers, I have no opinion to way or another.
>
>> You'd also have to disallow the case when there are any triggers on
>> the INSERT, or where there are any triggers on anything else (because
>> they might access the target table of the INSERT).  This will end up
>> being so restricted as to be useless.
>
> Well, it's already the case that BEFORE triggers shouldn't count on
> seeing any particular subset of a statement's results completed.
> We could define AFTER triggers as all being fired after the entire
> statement is complete (which is not the direction my patch was headed
> in, but I have no allegiance to that).  So I think we could define our
> way out of the trigger timing issue, and I don't see any big objection
> to limiting pipelining to cases where the sub-statements' target  
> tables
> don't overlap.

Hmm... yeah. If we do that, then pipelining becomes a much more  
feasible optimization.  I think that definition is a bit more likely  
to result in POLA violations, but I'm not sure by how much.

> However, this still doesn't address the problem of what happens when  
> the
> top-level select fails to read all of the CTE output (because it has a
> LIMIT, or the client doesn't read all the output of a portal, etc  
> etc).
> Partially executing an update in such cases is no good.

Well, like you said elsewhere on this thread, you just have to finish  
out any remaining bits after the main query completes.

...Robert