Re: Patch to support SEMI and ANTI join removal

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Patch to support SEMI and ANTI join removal
Дата
Msg-id CAApHDvq5S41QAAr7_WtKyZvVuFkBfzi_+NuLJc80S9f5=Nqc1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Patch to support SEMI and ANTI join removal  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Patch to support SEMI and ANTI join removal  (Andres Freund <andres@2ndquadrant.com>)
Re: Patch to support SEMI and ANTI join removal  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Sep 26, 2014 at 12:36 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 09/16/2014 01:20 PM, David Rowley wrote:
+       /*
+        * We mustn't allow any joins to be removed if there are any pending
+        * foreign key triggers in the queue. This could happen if we are planning
+        * a query that has been executed from within a volatile function and the
+        * query which called this volatile function has made some changes to a
+        * table referenced by a foreign key. The reason for this is that any
+        * updates to a table which is referenced by a foreign key constraint will
+        * only have the referencing tables updated after the command is complete,
+        * so there is a window of time where records may violate the foreign key
+        * constraint.
+        *
+        * Currently this code is quite naive, as we won't even attempt to remove
+        * the join if there are *any* pending foreign key triggers, on any
+        * relation. It may be worthwhile to improve this to check if there's any
+        * pending triggers for the referencing relation in the join.
+        */
+       if (!AfterTriggerQueueIsEmpty())
+               return false;


Hi Heikki,

Thanks for having a look at the patch.

Hmm. This code runs when the query is planned. There is no guarantee that there won't be after triggers pending when the query is later *executed*.


Please correct anything that sounds wrong here, but my understanding is that we'll always plan a query right before we execute it, with the exception of PREPARE statements where PostgreSQL will cache the query plan when the prepare statement is first executed. So I think you may have a point here regarding PREPARE'd statements, but I think that it is isolated to those.

I think in all other cases we'll plan right before we execute. So if we happen to be planning an UPDATE statement which has a sub-query that perform some INNER JOINs, I think we're safe to remove INNER JOINs when possible, as the UPDATE statement won't get visibility of its own changes.

We can see that here:

create table updatetest (id int primary key, value int, value2 int);

create or replace function getvalue2(p_id int) returns int
as $$select value2 from updatetest where id = p_id$$
language sql volatile;


insert into updatetest values(0,0,0);
insert into updatetest values(1,10,10);
insert into updatetest values(2,20,20);
insert into updatetest values(3,30,30);

update updatetest set value = COALESCE((select value from updatetest u2 where updatetest.id - 1 = u2.id) + 1,0);

update updatetest set value2 = COALESCE(getvalue2(id - 1) + 1,0);

select * from updatetest;
 id | value | value2
----+-------+--------
  0 |     0 |      0
  1 |     1 |      1
  2 |    11 |      2
  3 |    21 |      3

The value column appears to have been set based on the value that was previously in the value column, and has not come from the newly set value. The behaviour is different for the value2 column as the value for this has been fetched from another query, which *does* see the newly updated value stored in the value2 column.

My understanding of foreign keys is that any pending foreign key triggers will be executed just before the query completes, so we should only ever encounter pending foreign key triggers during planning when we're planning a query that's being executed from somewhere like a volatile function or trigger function, if the outer query has updated or deleted some records which are referenced by a foreign key.

So I think with the check for pending triggers at planning time this is safe at least for queries being planned right before they're executed, but you've caused me to realise that I'll probably need to do some more work on this for when it comes to PREPARE'd queries, as it looks like if we executed a prepared query from inside a volatile function or trigger function that was called from a DELETE or UPDATE statement that caused foreign key triggers to be queued, and we'd happened to have removed some INNER JOINs when we originally planned that prepare statement, then that would be wrong.

The only thing that comes to mind to fix that right now is to tag something maybe in PlannerInfo to say if we've removed any INNER JOINs in planning, then when we execute a prepared statement we could void the cached plan we see that some INNER JOINs were removed, but only do this if the foreign key trigger queue has pending triggers. (which will hopefully not be very often).

Another thing that comes to mind which may be similar is how we handle something like:

PREPARE a AS SELECT * from tbl WHERE name LIKE $1;

Where, if $1 is 'foo' or 'foo%' we might want to use an index scan, but if $1 was '%foo' then we'd probably not. 
I've not yet looked into great detail of what happens here, but from some quick simple tests it seems to replan each time!? But perhaps that'd due to the parameter, where with my other tests the PREPARE statement had no parameters.

There was some other discussion relating to some of this over here-> http://www.postgresql.org/message-id/20140603235053.GA351732@tornado.leadboat.com 

Regards

David Rowley


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: json (b) and null fields
Следующее
От: Michael Paquier
Дата:
Сообщение: Missing newlines in verbose logs of pg_dump, introduced by RLS patch