Обсуждение: BUG #5798: Some weird error with pl/pgsql procedure

Поиск

Список

Период

Сортировка

BUG #5798: Some weird error with pl/pgsql procedure

От

"Maxim Boguk"

Дата:

21 декабря 2010 г., 01:48:10

The following bug has been logged online:

Bug reference:      5798
Logged by:          Maxim Boguk
Email address:      Maxim.Boguk@gmail.com
PostgreSQL version: 8.4.4
Operating system:   FreeBSD 7.2
Description:        Some weird error with pl/pgsql procedure
Details:

While I testing my table compactor utility
(http://code.google.com/p/compacttable/), I sometime getting exeptionally
weird errors from pl/pgsql procedure.

2010-12-21 05:17:19 MSK 17263 postgres@hh from 192.168.1.129
[vxid:22/1898656 txid:20671567] [SELECT] ERROR:  cannot extract system
attribute from virtual tuple
2010-12-21 05:17:19 MSK 17263 postgres@hh from 192.168.1.129
[vxid:22/1898656 txid:20671567] [SELECT] CONTEXT:  PL/pgSQL function
"__clear_table_tail" line 50 at FOR over EXECUTE statement
2010-12-21 05:17:19 MSK 17263 postgres@hh from 192.168.1.129
[vxid:22/1898656 txid:20671567] [SELECT] STATEMENT:
        SELECT * from __clear_table_tail('vacancy_view', 'public', 'count_',
25053, 25)

Now 50th line of the procedure is:
        --trying update all tuples in current page range
        FOR _new_tid IN EXECUTE 'UPDATE ONLY ' ||
quote_ident(arg_relnamespace) || '.' || quote_ident(arg_relname) || ' SET '
|| arg_field || ' = ' || arg_field || ' WHERE ctid=ANY($1) RETURNING ctid'
USING _current_tid_list LOOP

This error usually happen with another error from Slony side (this error
easy to understand and not an problem at all):
2010-12-21 05:15:08 MSK 24679 slony@hh from [local] [vxid:37/4336609
txid:20671079] [UPDATE] ERROR:  could not serialize access due to concurrent
update
2010-12-21 05:15:08 MSK 24679 slony@hh from [local] [vxid:37/4336609
txid:20671079] [UPDATE] STATEMENT:  update only "public"."vacancy_view" set
"count_"='1428' where "vacancy_id"='3568635';
...

I think there some strange concurrency issues with pl/pgsql and slony, and I
don't understand reason of that error.

This error not easy to reproduce but happens for me 3 times during tests on
130GB slave database run with my utility.

Anyone can enlighten me what happens here?

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Robert Haas

Дата:

22 декабря 2010 г., 03:22:58

On Mon, Dec 20, 2010 at 9:48 PM, Maxim Boguk <Maxim.Boguk@gmail.com> wrote:
> Anyone can enlighten me what happens here?

That does look weird, but without a simple test case I think it's
going to be hard to investigate this.  Installing Slony and your code
and building a 130GB is more effort than I'm willing to put in...

Any way you can get a stack backtrace at the point the error occurs?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Maxim Boguk

Дата:

22 декабря 2010 г., 20:54:36

It happened again in much simpler case (no slony or deadlock message
involved... no slony at all on the database).

The same error
DBD::Pg::db selectall_arrayref failed: ERROR:  cannot extract system
attribute from virtual tuple
CONTEXT:  PL/pgSQL function "__clear_table_tail" line 50 at FOR over
EXECUTE statement

It happened when function worked with small table with only 33 pages
size and 600 live tuples.
What made situations even more easy to track it is I know this table
don't have any inserts or deletes for ages (only select and update)
and I have full log of database queries during problem.

Log of call of the procedure:
2010-12-23 00:21:04 MSK pgsql@billing 55717 ERROR:  cannot extract
system attribute from virtual tuple
2010-12-23 00:21:04 MSK pgsql@billing 55717 CONTEXT:  PL/pgSQL
function "__clear_table_tail" line 50 at FOR over EXECUTE statement
2010-12-23 00:21:04 MSK pgsql@billing 55717 STATEMENT:
        SELECT * from __clear_table_tail('ctrl_servers', 'bill', 'opt', 21,=
 1)

In the same time log all actions involving changes on table
ctrl_servers during +/- 3 second from ERROR:
2010-12-23 00:21:03 MSK bill@billing 42078 LOG:  duration: 0.736 ms
statement: UPDATE ctrl_servers SET last_connect =3D 'now' WHERE ( id =3D
'1514' )
Just one query.

Line where ERROR happens:
        FOR _new_tid IN EXECUTE 'UPDATE ONLY ' ||
quote_ident(arg_relnamespace) || '.' || quote_ident(arg_relname) || '
SET ' || arg_field || ' =3D ' || arg_field || ' WHERE ctid=3DANY($1)
RETURNING ctid' USING _current_tid_list LOOP

Need be said _current_tid_list at start of procedure call contained
all possible tids for 21 page of relation (not only for live tuples
but all possible (in range like {21,1} to {21,300+})).

I have an idea the error happens because some rare race conditions
with tid scan and concurrent updates in the same page (probably HOT
somehow involved... because both updateable fields in function and
concurrent update query updated non-indexed fields).

Currently I work on creating self contained test case (unfortunately
without much success until now).

About stack backtrace I not sure how to get it (and even worse the
database is heavily loaded production DB so I not sure creating core
dump would be good idea).
Still I would like to receive any help with creating a stack backtrace.

On Wed, Dec 22, 2010 at 5:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Dec 20, 2010 at 9:48 PM, Maxim Boguk <Maxim.Boguk@gmail.com> wrot=
e:
>> Anyone can enlighten me what happens here?
>
> That does look weird, but without a simple test case I think it's
> going to be hard to investigate this. =C2=A0Installing Slony and your code
> and building a 130GB is more effort than I'm willing to put in...
>
> Any way you can get a stack backtrace at the point the error occurs?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--=20
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
=D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/

=D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=BE=
=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =D0=
=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=BE=
=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =D0=
=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5.

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Robert Haas

Дата:

05 января 2011 г., 02:28:23

On Wed, Dec 22, 2010 at 4:53 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
> About stack backtrace I not sure how to get it (and even worse the
> database is heavily loaded production DB so I not sure creating core
> dump would be good idea).
> Still I would like to receive any help with creating a stack backtrace.

Well, if you can reproduce this problem in a given session, you can
use pg_backend_pid() at the start of the session to get the backend
ID, and then if your build has debugging symbols (or separate
debuginfos you can install), you could attach gdb to it, set a
breakpoint at the lines in heaptuple.c that can throw this error,
reproduce the problem, and then get a backtrace.  But if it's
happening randomly I don't have a lot of ideas.

I do wonder if this could be related to using TIDs that don't actually
exist in the table.  That might be worth some experimentation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Maxim Boguk

Дата:

19 февраля 2011 г., 11:14:26

Hi all and Hi Robert

I finally managed create working slave server with non-stripped Postgresql
8.4.7 and working gdb.
(sorry for such long delay).
And i can reproduce situation quite easy now, so i managed get stack
backtrace with that error.
What i need to do now? I have some expirience with gbd now but i have no
idea where to look next.

Full backtrace:

Breakpoint 1, slot_getattr (slot=3D0xdd09ea468, attnum=3D-1, isnull=3D0xdd0=
9b4a68
"") at heaptuple.c:1185
1185                            elog(ERROR, "cannot extract system attribute
from virtual tuple");
(gdb) bt
#0  slot_getattr (slot=3D0xdd09ea468, attnum=3D-1, isnull=3D0xdd09b4a68 "")=
 at
heaptuple.c:1185
#1  0x000000000058a5f8 in ExecEvalScalarVar (exprstate=3D0xdd09b4b60,
econtext=3D0xdd09b4a80, isNull=3D0xdd09b4a68 "", isDone=3D0xdd09b4ce0)
    at execQual.c:787
#2  0x00000000005933d7 in ExecTargetList (targetlist=3D0xdd09b4cb0,
econtext=3D0xdd09b4a80, values=3D0xdd09b4a50, isnull=3D0xdd09b4a68 "",
    itemIsDone=3D0xdd09b4ce0, isDone=3D0x0) at execQual.c:5118
#3  0x00000000005939a3 in ExecProject (projInfo=3D0xdd09b4be0, isDone=3D0x0=
) at
execQual.c:5333
#4  0x00000000005879c7 in ExecProcessReturning
(projectReturning=3D0xdd09b4be0, tupleSlot=3D0xdd09ea468, planSlot=3D0xdd09=
ea3a8,
    dest=3D0xdd09de5c8) at execMain.c:2273
#5  0x00000000005875fe in ExecUpdate (slot=3D0xdd09ea468,
tupleid=3D0x7fffffffd2e0, planSlot=3D0xdd09ea3a8, dest=3D0xdd09de5c8,
    estate=3D0xdd09e6030) at execMain.c:2142
#6  0x0000000000586c68 in ExecutePlan (estate=3D0xdd09e6030,
planstate=3D0xdd09ea550, operation=3DCMD_UPDATE, numberTuples=3D0,
    direction=3DForwardScanDirection, dest=3D0xdd09de5c8) at execMain.c:1681
#7  0x00000000005849a5 in standard_ExecutorRun (queryDesc=3D0xdd09de658,
direction=3DForwardScanDirection, count=3D0) at execMain.c:309
#8  0x00000000005848a5 in ExecutorRun (queryDesc=3D0xdd09de658,
direction=3DForwardScanDirection, count=3D0) at execMain.c:258
#9  0x0000000000673d43 in ProcessQuery (plan=3D0xdd0998890,
    sourceText=3D0xdd09c5830 "UPDATE ONLY bill.ctrl_servers SET opt =3D opt
WHERE ctid=3DANY($1) RETURNING ctid", params=3D0xdd09c58c0,
    dest=3D0xdd09de5c8, completionTag=3D0x7fffffffd4e0 "") at pquery.c:196
#10 0x00000000006754e5 in PortalRunMulti (portal=3D0xdd08c8140, isTopLevel=
=3D0
'\000', dest=3D0xdd09de5c8, altdest=3D0x9dfdc0,
    completionTag=3D0x7fffffffd4e0 "") at pquery.c:1269
#11 0x0000000000675127 in FillPortalStore (portal=3D0xdd08c8140, isTopLevel=
=3D0
'\000') at pquery.c:1061
#12 0x00000000006757f8 in PortalRunFetch (portal=3D0xdd08c8140,
fdirection=3DFETCH_FORWARD, count=3D10, dest=3D0x9dfe40) at pquery.c:1397
#13 0x00000000005b24e1 in _SPI_cursor_operation (portal=3D0xdd08c8140,
direction=3DFETCH_FORWARD, count=3D10, dest=3D0x9dfe40) at spi.c:2088
#14 0x00000000005b11a8 in SPI_cursor_fetch (portal=3D0xdd08c8140, forward=
=3D1
'\001', count=3D10) at spi.c:1258
#15 0x0000000dd0b1e199 in exec_for_query (estate=3D0x7fffffffda50,
stmt=3D0xdd098bfc8, portal=3D0xdd08c8140, prefetch_ok=3D1 '\001')
    at pl_exec.c:4262
#16 0x0000000dd0b1bc43 in exec_stmt_dynfors (estate=3D0x7fffffffda50,
stmt=3D0xdd098bfc8) at pl_exec.c:3107
#17 0x0000000dd0b1881a in exec_stmt (estate=3D0x7fffffffda50,
stmt=3D0xdd098bfc8) at pl_exec.c:1308
#18 0x0000000dd0b1859a in exec_stmts (estate=3D0x7fffffffda50,
stmts=3D0xdd098bc28) at pl_exec.c:1203
#19 0x0000000dd0b18f38 in exec_stmt_while (estate=3D0x7fffffffda50,
stmt=3D0xdd098d6f8) at pl_exec.c:1608
#20 0x0000000dd0b18733 in exec_stmt (estate=3D0x7fffffffda50,
stmt=3D0xdd098d6f8) at pl_exec.c:1264
#21 0x0000000dd0b1859a in exec_stmts (estate=3D0x7fffffffda50,
stmts=3D0xdd098a650) at pl_exec.c:1203
#22 0x0000000dd0b183df in exec_stmt_block (estate=3D0x7fffffffda50,
block=3D0xdd098d778) at pl_exec.c:1140
#23 0x0000000dd0b16a29 in plpgsql_exec_function (func=3D0xdd087abd8,
fcinfo=3D0x7fffffffdcf0) at pl_exec.c:315
#24 0x0000000dd0b1260f in plpgsql_call_handler (fcinfo=3D0x7fffffffdcf0) at
pl_handler.c:95
---Type <return> to continue, or q <return> to quit---
#25 0x000000000058c902 in ExecMakeTableFunctionResult (funcexpr=3D0xdd08ca7=
30,
econtext=3D0xdd08ca460, expectedDesc=3D0xdd08ca5f0,
    randomAccess=3D0 '\000') at execQual.c:2016
#26 0x00000000005a3e7d in FunctionNext (node=3D0xdd08ca350) at
nodeFunctionscan.c:64
#27 0x0000000000593a10 in ExecScan (node=3D0xdd08ca350, accessMtd=3D0x5a3e10
<FunctionNext>) at execScan.c:68
#28 0x00000000005a3eea in ExecFunctionScan (node=3D0xdd08ca350) at
nodeFunctionscan.c:98
#29 0x0000000000588fdf in ExecProcNode (node=3D0xdd08ca350) at
execProcnode.c:385
#30 0x0000000000586794 in ExecutePlan (estate=3D0xdd08ca030,
planstate=3D0xdd08ca350, operation=3DCMD_SELECT, numberTuples=3D0,
    direction=3DForwardScanDirection, dest=3D0xdd08996f0) at execMain.c:1504
#31 0x00000000005849a5 in standard_ExecutorRun (queryDesc=3D0xdd082f430,
direction=3DForwardScanDirection, count=3D0) at execMain.c:309
#32 0x00000000005848a5 in ExecutorRun (queryDesc=3D0xdd082f430,
direction=3DForwardScanDirection, count=3D0) at execMain.c:258
#33 0x0000000000674e18 in PortalRunSelect (portal=3D0xdd08c8030, forward=3D1
'\001', count=3D0, dest=3D0xdd08996f0) at pquery.c:953
#34 0x0000000000674ab9 in PortalRun (portal=3D0xdd08c8030,
count=3D9223372036854775807, isTopLevel=3D1 '\001', dest=3D0xdd08996f0,
    altdest=3D0xdd08996f0, completionTag=3D0x7fffffffe570 "") at pquery.c:7=
79
#35 0x000000000066f0e6 in exec_simple_query (
    query_string=3D0x8037e5030 "\nSELECT * from
__clear_table_tail('ctrl_servers', 'bill', 'opt', 13, 1)\n") at
postgres.c:997
#36 0x0000000000672f72 in PostgresMain (argc=3D4, argv=3D0x803732930,
username=3D0x803732900 "pgsql") at postgres.c:3676
#37 0x000000000063b5cd in BackendRun (port=3D0x803703c00) at postmaster.c:3=
467
#38 0x000000000063ab2a in BackendStartup (port=3D0x803703c00) at
postmaster.c:3081
#39 0x00000000006382dc in ServerLoop () at postmaster.c:1387
#40 0x0000000000637abb in PostmasterMain (argc=3D3, argv=3D0x7fffffffeb88) =
at
postmaster.c:1040
#41 0x00000000005c0b5a in main (argc=3D3, argv=3D0x7fffffffeb88) at main.c:=
188


On Wed, Jan 5, 2011 at 4:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:

> On Wed, Dec 22, 2010 at 4:53 PM, Maxim Boguk <maxim.boguk@gmail.com>
> wrote:
> > About stack backtrace I not sure how to get it (and even worse the
> > database is heavily loaded production DB so I not sure creating core
> > dump would be good idea).
> > Still I would like to receive any help with creating a stack backtrace.
>
> Well, if you can reproduce this problem in a given session, you can
> use pg_backend_pid() at the start of the session to get the backend
> ID, and then if your build has debugging symbols (or separate
> debuginfos you can install), you could attach gdb to it, set a
> breakpoint at the lines in heaptuple.c that can throw this error,
> reproduce the problem, and then get a backtrace.  But if it's
> happening randomly I don't have a lot of ideas.
>
> I do wonder if this could be related to using TIDs that don't actually
> exist in the table.  That might be worth some experimentation.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



--=20
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

=D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/
=D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=BE=
=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =D0=
=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=BE=
=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =D0=
=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5
=D0=B2=D1=81=D0=B5.

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Maxim Boguk

Дата:

19 февраля 2011 г., 12:06:21

Just an idea from code examination:

I think problem in
" RETURNING ctid"
part of query
and next comment in slot_getattr:
        /*
         * system attributes are handled by heap_getsysattr
         */

So, may be RETURNING implementation somehow incorrectely handle system
attributes in returing values list.

On Sun, Feb 20, 2011 at 1:13 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote:

> Hi all and Hi Robert
>
> I finally managed create working slave server with non-stripped Postgresql
> 8.4.7 and working gdb.
> (sorry for such long delay).
> And i can reproduce situation quite easy now, so i managed get stack
> backtrace with that error.
> What i need to do now? I have some expirience with gbd now but i have no
> idea where to look next.
>
> Full backtrace:
>
> Breakpoint 1, slot_getattr (slot=3D0xdd09ea468, attnum=3D-1, isnull=3D0xd=
d09b4a68
> "") at heaptuple.c:1185
> 1185                            elog(ERROR, "cannot extract system
> attribute from virtual tuple");
> (gdb) bt
> #0  slot_getattr (slot=3D0xdd09ea468, attnum=3D-1, isnull=3D0xdd09b4a68 "=
") at
> heaptuple.c:1185
> #1  0x000000000058a5f8 in ExecEvalScalarVar (exprstate=3D0xdd09b4b60,
> econtext=3D0xdd09b4a80, isNull=3D0xdd09b4a68 "", isDone=3D0xdd09b4ce0)
>     at execQual.c:787
> #2  0x00000000005933d7 in ExecTargetList (targetlist=3D0xdd09b4cb0,
> econtext=3D0xdd09b4a80, values=3D0xdd09b4a50, isnull=3D0xdd09b4a68 "",
>     itemIsDone=3D0xdd09b4ce0, isDone=3D0x0) at execQual.c:5118
> #3  0x00000000005939a3 in ExecProject (projInfo=3D0xdd09b4be0, isDone=3D0=
x0) at
> execQual.c:5333
> #4  0x00000000005879c7 in ExecProcessReturning
> (projectReturning=3D0xdd09b4be0, tupleSlot=3D0xdd09ea468, planSlot=3D0xdd=
09ea3a8,
>     dest=3D0xdd09de5c8) at execMain.c:2273
> #5  0x00000000005875fe in ExecUpdate (slot=3D0xdd09ea468,
> tupleid=3D0x7fffffffd2e0, planSlot=3D0xdd09ea3a8, dest=3D0xdd09de5c8,
>     estate=3D0xdd09e6030) at execMain.c:2142
> #6  0x0000000000586c68 in ExecutePlan (estate=3D0xdd09e6030,
> planstate=3D0xdd09ea550, operation=3DCMD_UPDATE, numberTuples=3D0,
>     direction=3DForwardScanDirection, dest=3D0xdd09de5c8) at execMain.c:1=
681
> #7  0x00000000005849a5 in standard_ExecutorRun (queryDesc=3D0xdd09de658,
> direction=3DForwardScanDirection, count=3D0) at execMain.c:309
> #8  0x00000000005848a5 in ExecutorRun (queryDesc=3D0xdd09de658,
> direction=3DForwardScanDirection, count=3D0) at execMain.c:258
> #9  0x0000000000673d43 in ProcessQuery (plan=3D0xdd0998890,
>     sourceText=3D0xdd09c5830 "UPDATE ONLY bill.ctrl_servers SET opt =3D o=
pt
> WHERE ctid=3DANY($1) RETURNING ctid", params=3D0xdd09c58c0,
>     dest=3D0xdd09de5c8, completionTag=3D0x7fffffffd4e0 "") at pquery.c:196
> #10 0x00000000006754e5 in PortalRunMulti (portal=3D0xdd08c8140, isTopLeve=
l=3D0
> '\000', dest=3D0xdd09de5c8, altdest=3D0x9dfdc0,
>     completionTag=3D0x7fffffffd4e0 "") at pquery.c:1269
> #11 0x0000000000675127 in FillPortalStore (portal=3D0xdd08c8140, isTopLev=
el=3D0
> '\000') at pquery.c:1061
> #12 0x00000000006757f8 in PortalRunFetch (portal=3D0xdd08c8140,
> fdirection=3DFETCH_FORWARD, count=3D10, dest=3D0x9dfe40) at pquery.c:1397
> #13 0x00000000005b24e1 in _SPI_cursor_operation (portal=3D0xdd08c8140,
> direction=3DFETCH_FORWARD, count=3D10, dest=3D0x9dfe40) at spi.c:2088
> #14 0x00000000005b11a8 in SPI_cursor_fetch (portal=3D0xdd08c8140, forward=
=3D1
> '\001', count=3D10) at spi.c:1258
> #15 0x0000000dd0b1e199 in exec_for_query (estate=3D0x7fffffffda50,
> stmt=3D0xdd098bfc8, portal=3D0xdd08c8140, prefetch_ok=3D1 '\001')
>     at pl_exec.c:4262
> #16 0x0000000dd0b1bc43 in exec_stmt_dynfors (estate=3D0x7fffffffda50,
> stmt=3D0xdd098bfc8) at pl_exec.c:3107
> #17 0x0000000dd0b1881a in exec_stmt (estate=3D0x7fffffffda50,
> stmt=3D0xdd098bfc8) at pl_exec.c:1308
> #18 0x0000000dd0b1859a in exec_stmts (estate=3D0x7fffffffda50,
> stmts=3D0xdd098bc28) at pl_exec.c:1203
> #19 0x0000000dd0b18f38 in exec_stmt_while (estate=3D0x7fffffffda50,
> stmt=3D0xdd098d6f8) at pl_exec.c:1608
> #20 0x0000000dd0b18733 in exec_stmt (estate=3D0x7fffffffda50,
> stmt=3D0xdd098d6f8) at pl_exec.c:1264
> #21 0x0000000dd0b1859a in exec_stmts (estate=3D0x7fffffffda50,
> stmts=3D0xdd098a650) at pl_exec.c:1203
> #22 0x0000000dd0b183df in exec_stmt_block (estate=3D0x7fffffffda50,
> block=3D0xdd098d778) at pl_exec.c:1140
> #23 0x0000000dd0b16a29 in plpgsql_exec_function (func=3D0xdd087abd8,
> fcinfo=3D0x7fffffffdcf0) at pl_exec.c:315
> #24 0x0000000dd0b1260f in plpgsql_call_handler (fcinfo=3D0x7fffffffdcf0) =
at
> pl_handler.c:95
> ---Type <return> to continue, or q <return> to quit---
> #25 0x000000000058c902 in ExecMakeTableFunctionResult
> (funcexpr=3D0xdd08ca730, econtext=3D0xdd08ca460, expectedDesc=3D0xdd08ca5=
f0,
>     randomAccess=3D0 '\000') at execQual.c:2016
> #26 0x00000000005a3e7d in FunctionNext (node=3D0xdd08ca350) at
> nodeFunctionscan.c:64
> #27 0x0000000000593a10 in ExecScan (node=3D0xdd08ca350, accessMtd=3D0x5a3=
e10
> <FunctionNext>) at execScan.c:68
> #28 0x00000000005a3eea in ExecFunctionScan (node=3D0xdd08ca350) at
> nodeFunctionscan.c:98
> #29 0x0000000000588fdf in ExecProcNode (node=3D0xdd08ca350) at
> execProcnode.c:385
> #30 0x0000000000586794 in ExecutePlan (estate=3D0xdd08ca030,
> planstate=3D0xdd08ca350, operation=3DCMD_SELECT, numberTuples=3D0,
>     direction=3DForwardScanDirection, dest=3D0xdd08996f0) at execMain.c:1=
504
> #31 0x00000000005849a5 in standard_ExecutorRun (queryDesc=3D0xdd082f430,
> direction=3DForwardScanDirection, count=3D0) at execMain.c:309
> #32 0x00000000005848a5 in ExecutorRun (queryDesc=3D0xdd082f430,
> direction=3DForwardScanDirection, count=3D0) at execMain.c:258
> #33 0x0000000000674e18 in PortalRunSelect (portal=3D0xdd08c8030, forward=
=3D1
> '\001', count=3D0, dest=3D0xdd08996f0) at pquery.c:953
> #34 0x0000000000674ab9 in PortalRun (portal=3D0xdd08c8030,
> count=3D9223372036854775807, isTopLevel=3D1 '\001', dest=3D0xdd08996f0,
>     altdest=3D0xdd08996f0, completionTag=3D0x7fffffffe570 "") at pquery.c=
:779
> #35 0x000000000066f0e6 in exec_simple_query (
>     query_string=3D0x8037e5030 "\nSELECT * from
> __clear_table_tail('ctrl_servers', 'bill', 'opt', 13, 1)\n") at
> postgres.c:997
> #36 0x0000000000672f72 in PostgresMain (argc=3D4, argv=3D0x803732930,
> username=3D0x803732900 "pgsql") at postgres.c:3676
> #37 0x000000000063b5cd in BackendRun (port=3D0x803703c00) at
> postmaster.c:3467
> #38 0x000000000063ab2a in BackendStartup (port=3D0x803703c00) at
> postmaster.c:3081
> #39 0x00000000006382dc in ServerLoop () at postmaster.c:1387
> #40 0x0000000000637abb in PostmasterMain (argc=3D3, argv=3D0x7fffffffeb88=
) at
> postmaster.c:1040
> #41 0x00000000005c0b5a in main (argc=3D3, argv=3D0x7fffffffeb88) at main.=
c:188
>
>
>
> On Wed, Jan 5, 2011 at 4:28 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>> On Wed, Dec 22, 2010 at 4:53 PM, Maxim Boguk <maxim.boguk@gmail.com>
>> wrote:
>> > About stack backtrace I not sure how to get it (and even worse the
>> > database is heavily loaded production DB so I not sure creating core
>> > dump would be good idea).
>> > Still I would like to receive any help with creating a stack backtrace.
>>
>> Well, if you can reproduce this problem in a given session, you can
>> use pg_backend_pid() at the start of the session to get the backend
>> ID, and then if your build has debugging symbols (or separate
>> debuginfos you can install), you could attach gdb to it, set a
>> breakpoint at the lines in heaptuple.c that can throw this error,
>> reproduce the problem, and then get a backtrace.  But if it's
>> happening randomly I don't have a lot of ideas.
>>
>> I do wonder if this could be related to using TIDs that don't actually
>> exist in the table.  That might be worth some experimentation.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>
>
>
> --
> Maxim Boguk
> Senior Postgresql DBA.
>
> Skype: maxim.boguk
> Jabber: maxim.boguk@gmail.com
>
> LinkedIn profile: http://nz.linkedin.com/in/maximboguk
> If they can send one man to the moon... why can't they send them all?
>
>
> =D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/
> =D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=
=BE=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =
=D0=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=
=BE=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =
=D0=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5
> =D0=B2=D1=81=D0=B5.
>



--=20
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

=D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/
=D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=BE=
=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =D0=
=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=BE=
=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =D0=
=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5
=D0=B2=D1=81=D0=B5.

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Tom Lane

Дата:

19 февраля 2011 г., 15:08:25

Maxim Boguk <maxim.boguk@gmail.com> writes:
> I finally managed create working slave server with non-stripped Postgresql
> 8.4.7 and working gdb.
> (sorry for such long delay).
> And i can reproduce situation quite easy now, so i managed get stack
> backtrace with that error.
> What i need to do now? I have some expirience with gbd now but i have no
> idea where to look next.

Now that you know what query is causing the problem:

> #9  0x0000000000673d43 in ProcessQuery (plan=0xdd0998890,
>     sourceText=0xdd09c5830 "UPDATE ONLY bill.ctrl_servers SET opt = opt
> WHERE ctid=ANY($1) RETURNING ctid", params=0xdd09c58c0,
>     dest=0xdd09de5c8, completionTag=0x7fffffffd4e0 "") at pquery.c:196

it shouldn't be too hard to prepare a self-contained test case so that
others can look at this.

            regards, tom lane

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Maxim Boguk

Дата:

19 февраля 2011 г., 22:02:27

On Sun, Feb 20, 2011 at 5:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Maxim Boguk <maxim.boguk@gmail.com> writes:
> > I finally managed create working slave server with non-stripped
> Postgresql
> > 8.4.7 and working gdb.
> > (sorry for such long delay).
> > And i can reproduce situation quite easy now, so i managed get stack
> > backtrace with that error.
> > What i need to do now? I have some expirience with gbd now but i have no
> > idea where to look next.
>
> Now that you know what query is causing the problem:
>
> > #9  0x0000000000673d43 in ProcessQuery (plan=3D0xdd0998890,
> >     sourceText=3D0xdd09c5830 "UPDATE ONLY bill.ctrl_servers SET opt =3D=
 opt
> > WHERE ctid=3DANY($1) RETURNING ctid", params=3D0xdd09c58c0,
> >     dest=3D0xdd09de5c8, completionTag=3D0x7fffffffd4e0 "") at pquery.c:=
196
>
> it shouldn't be too hard to prepare a self-contained test case so that
> others can look at this.
>
>                        regards, tom lane
>

Sorry, but it's very unlikely I can create self-contained test case.
That error doesn't happen on stand alone idle server.
It's only happen at random time on table with very high update rate in
production environment, and happen over 1-30 minutes calling my storable
procedure in endless loop (e.g. once per 1000+ calls).
I was long ago sure about exact place where that error happens  (becuase my
storable procedure contains only one real sql query), but because I can't
create self contained test case I was asked use gdb and try to get backtrace
at point of error.

So now I can examine and provide variable values at that point to give a
clue about what happens here.

--=20
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

=D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/
=D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=BE=
=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =D0=
=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=BE=
=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =D0=
=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5
=D0=B2=D1=81=D0=B5.

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Maxim Boguk

Дата:

20 февраля 2011 г., 11:26:24

Today I perfomed some more digging with code and gdb.

I not sure, but may be I have an idea what happens here:

RETURNING ctid fails when code implementing returning happened to get a
"Virtual" tuple as input.
And that ofcourse fail because "virtual tuple does not have any "system
columns" (such as tid).
(seems no one got such error before because no one tried to use system
columns in RETURNING list).

I tracked virtual tuple from heaptuple.c::slot_getattr down to
execMain.c::ExecUpdate and I think somewhere in this way virtual tuple
should be need to be "materialized" into physical tuple.

I have zero expirience with reporting such results so I will try my best:
path of that virtual tuple in problem case looks like:

*(slot_getattr::slot)
*(ExecEvalVar::econtext->ecxt_scantuple)
*(ExecTargetList::econtext->ecxt_scantuple)
*(ExecProject::econtext->ecxt_scantuple)
*(ExecProject::projInfo->pi_exprContext->ecxt_scantuple)
*(ExecProcessReturning::projectReturning->pi_exprContext->ecxt_scantuple)
*(ExecUpdate::resultRelInfo->ri_projectReturning->pi_exprContext->ecxt_scan=
tuple)

All these variable contained the same TupleTableSlot structure:

(gdb) print
*(ExecUpdate::resultRelInfo->ri_projectReturning->pi_exprContext->ecxt_scan=
tuple)
$29 =3D {
  type =3D T_TupleTableSlot,
  tts_isempty =3D 0 '\000',
  tts_shouldFree =3D 0 '\000',
  tts_shouldFreeMin =3D 0 '\000',
  tts_slow =3D 0 '\000',
  tts_tuple =3D 0x0,
  tts_tupleDescriptor =3D 0xdd09d1030,
  tts_mcxt =3D 0xdd08da2e0,
  tts_buffer =3D 0,
  tts_nvalid =3D 11,
  tts_values =3D 0xdd09d1840,
  tts_isnull =3D 0xdd09d18d0 "",
  tts_mintuple =3D 0x0,
  tts_minhdr =3D {
    t_len =3D 0,
    t_self =3D {
      ip_blkid =3D {
        bi_hi =3D 0,
        bi_lo =3D 0
      },
      ip_posid =3D 44192
    },
    t_tableOid =3D 13,
    t_data =3D 0x10
  },
  tts_off =3D 59333634048
}

Again I not sure but ExecProcessReturning function seems good candidate to
perform tuple materialization.

PS: May be it seems just a lot handwaving but I just started study gdb and
postgresql source code.

PPS: I must say thanks to authors of http://doxygen.postgresql.org . It is
great tool and saved me lot time today.

On Sun, Feb 20, 2011 at 12:01 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:

>
>
> On Sun, Feb 20, 2011 at 5:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Maxim Boguk <maxim.boguk@gmail.com> writes:
>> > I finally managed create working slave server with non-stripped
>> Postgresql
>> > 8.4.7 and working gdb.
>> > (sorry for such long delay).
>> > And i can reproduce situation quite easy now, so i managed get stack
>> > backtrace with that error.
>> > What i need to do now? I have some expirience with gbd now but i have =
no
>> > idea where to look next.
>>
>> Now that you know what query is causing the problem:
>>
>> > #9  0x0000000000673d43 in ProcessQuery (plan=3D0xdd0998890,
>> >     sourceText=3D0xdd09c5830 "UPDATE ONLY bill.ctrl_servers SET opt =
=3D opt
>> > WHERE ctid=3DANY($1) RETURNING ctid", params=3D0xdd09c58c0,
>> >     dest=3D0xdd09de5c8, completionTag=3D0x7fffffffd4e0 "") at pquery.c=
:196
>>
>> it shouldn't be too hard to prepare a self-contained test case so that
>> others can look at this.
>>
>>                        regards, tom lane
>>
>
>
> Sorry, but it's very unlikely I can create self-contained test case.
> That error doesn't happen on stand alone idle server.
> It's only happen at random time on table with very high update rate in
> production environment, and happen over 1-30 minutes calling my storable
> procedure in endless loop (e.g. once per 1000+ calls).
> I was long ago sure about exact place where that error happens  (becuase =
my
> storable procedure contains only one real sql query), but because I can't
> create self contained test case I was asked use gdb and try to get backtr=
ace
> at point of error.
>
> So now I can examine and provide variable values at that point to give a
> clue about what happens here.
>
>
> --
> Maxim Boguk
> Senior Postgresql DBA.
>
> Skype: maxim.boguk
> Jabber: maxim.boguk@gmail.com
>
> LinkedIn profile: http://nz.linkedin.com/in/maximboguk
> If they can send one man to the moon... why can't they send them all?
>
> =D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/
> =D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=
=BE=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =
=D0=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=
=BE=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =
=D0=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5
> =D0=B2=D1=81=D0=B5.
>



--=20
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

=D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/
=D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=BE=
=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =D0=
=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=BE=
=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =D0=
=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5
=D0=B2=D1=81=D0=B5.

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Tom Lane

Дата:

20 февраля 2011 г., 20:07:55

Maxim Boguk <maxim.boguk@gmail.com> writes:
> I tracked virtual tuple from heaptuple.c::slot_getattr down to
> execMain.c::ExecUpdate and I think somewhere in this way virtual tuple
> should be need to be "materialized" into physical tuple.

That happens at the top of ExecUpdate:

    /*
     * get the heap tuple out of the tuple table slot, making sure we have a
     * writable copy
     */
    tuple = ExecMaterializeSlot(slot);

I don't see any code path there that could re-virtualize the slot.

Do you have any triggers or constraints on the target table?  Because
those are about the only things that influence this code path at all...

> Again I not sure but ExecProcessReturning function seems good candidate to
> perform tuple materialization.

If ExecProcessReturning were given a virtual slot, materializing it
would do no good at all for this problem, because the system columns in
the result would be garbage.  It has to receive an already materialized
slot that contains the right information beforehand.  The only reason
that retrieving ctid can work at all is that when heap_update is called,
"tuple" is pointing at the slot's contained tuple and so setting
tuple->t_self changes the contents of the slot.  That whole function is
completely dependent on the slot contents being physical not virtual,
and I don't see anything there that would break it.

            regards, tom lane

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Maxim Boguk

Дата:

21 февраля 2011 г., 09:59:27

Hi

Yes here is one BEFORE UPDATE trigger on relation
(
Triggers:
    _billing_main_denyaccess_71 BEFORE INSERT OR DELETE OR UPDATE ON
ctrl_servers FOR EACH ROW EXECUTE PROCEDURE
_billing_main.denyaccess('_billing_main')
)

However that trigger should not fire at all because storable procedure work
with set local session_replication_role to 'replica';

But tuple become "virtual" in these lines:

   2027                 newtuple =3D ExecBRUpdateTriggers(estate,
resultRelInfo,

2028
tupleid, tuple);
   2029
   2030                 if (newtuple =3D=3D NULL)   /* "do nothing" */
   2031                         return;
   2032
   2033                 if (newtuple !=3D tuple)  /* modified by Trigger(s)=
 */
   2034                 {
   2035                         /*
   2036                          * Put the modified tuple into a slot for
convenience of routines
   2037                          * below.  We assume the tuple was allocated
in per-tuple memory
   2038                          * context, and therefore will go away by
itself. The tuple table
   2039                          * slot should not try to clear it.
   2040                          */
   2041                         TupleTableSlot *newslot =3D
estate->es_trig_tuple_slot;
   2042
   2043                         if (newslot->tts_tupleDescriptor !=3D
slot->tts_tupleDescriptor)
   2044                                 ExecSetSlotDescriptor(newslot,
slot->tts_tupleDescriptor);
   2045                         ExecStoreTuple(newtuple, newslot,
InvalidBuffer, false);
   2046                         slot =3D newslot;
   2047                         tuple =3D newtuple;
   2048                 }

If be more exact at line:
   2047                         tuple =3D newtuple;

What I don't understand fist it is why that code path fail only once per
1000-100000 calls only.

I have an idea that
   2033                 if (newtuple !=3D tuple)  /* modified by Trigger(s)=
 */
sometime can return true even if no triggers really had been done.

After more code digging I think I found reason
Look like that happens in this part of ExecBRUpdateTriggers code:

02273     /*
02274      * In READ COMMITTED isolation level it's possible that newtuple =
was
02275      * changed due to concurrent update.  In that case we have a
raw subplan
02276      * output tuple and need to run it through the junk filter.
02277      */
02278     if (newSlot !=3D NULL
<http://doxygen.postgresql.org/c_8h.html#070d2ce7b6bb7e5c05602aa8c308d0c4>)
02279         intuple =3D newtuple =3D ExecRemoveJunk
<http://doxygen.postgresql.org/execJunk_8c.html#9779200174530a361ea70e21639=
c2c72>(relinfo->ri_junkFilter
<http://doxygen.postgresql.org/structResultRelInfo.html#0d887ebe5510dd3e663=
e95888d72cd67>,
newSlot);


That rarity of that situation become understandable (that is very very rare
case when tuple got changed in such way) and really create virtual tuple.

Look like need put tuple materialization code here as well.

PS: again may be just handwaving... I just started learn postgresql code.
Any help from more skilled persons would be great.

On Mon, Feb 21, 2011 at 10:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Maxim Boguk <maxim.boguk@gmail.com> writes:
> > I tracked virtual tuple from heaptuple.c::slot_getattr down to
> > execMain.c::ExecUpdate and I think somewhere in this way virtual tuple
> > should be need to be "materialized" into physical tuple.
>
> That happens at the top of ExecUpdate:
>
>    /*
>     * get the heap tuple out of the tuple table slot, making sure we have=
 a
>     * writable copy
>     */
>    tuple =3D ExecMaterializeSlot(slot);
>
> I don't see any code path there that could re-virtualize the slot.
>
> Do you have any triggers or constraints on the target table?  Because
> those are about the only things that influence this code path at all...
>
> > Again I not sure but ExecProcessReturning function seems good candidate
> to
> > perform tuple materialization.
>
> If ExecProcessReturning were given a virtual slot, materializing it
> would do no good at all for this problem, because the system columns in
> the result would be garbage.  It has to receive an already materialized
> slot that contains the right information beforehand.  The only reason
> that retrieving ctid can work at all is that when heap_update is called,
> "tuple" is pointing at the slot's contained tuple and so setting
> tuple->t_self changes the contents of the slot.  That whole function is
> completely dependent on the slot contents being physical not virtual,
> and I don't see anything there that would break it.
>
>                        regards, tom lane
>



--=20
Maxim Boguk
Senior Postgresql DBA.

Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com

LinkedIn profile: http://nz.linkedin.com/in/maximboguk
If they can send one man to the moon... why can't they send them all?

=D0=9C=D0=BE=D0=B9=D0=9A=D1=80=D1=83=D0=B3: http://mboguk.moikrug.ru/
=D0=A1=D0=B8=D0=BB=D0=B0 =D1=81=D0=BE=D0=BB=D0=BE=D0=BC=D1=83 =D0=BB=D0=BE=
=D0=BC=D0=B8=D1=82, =D0=BD=D0=BE =D0=BD=D0=B5 =D0=B2=D1=81=D0=B5 =D0=B2 =D0=
=BD=D0=B0=D1=88=D0=B5=D0=B9 =D0=B6=D0=B8=D0=B7=D0=BD=D0=B8 - =D1=81=D0=BE=
=D0=BB=D0=BE=D0=BC=D0=B0, =D0=B4=D0=B0 =D0=B8 =D1=81=D0=B8=D0=BB=D0=B0 =D0=
=B4=D0=B0=D0=BB=D0=B5=D0=BA=D0=BE =D0=BD=D0=B5
=D0=B2=D1=81=D0=B5.

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Tom Lane

Дата:

21 февраля 2011 г., 17:10:48

Maxim Boguk <maxim.boguk@gmail.com> writes:
> Yes here is one BEFORE UPDATE trigger on relation
> ...
> However that trigger should not fire at all because storable procedure work
> with set local session_replication_role to 'replica';

Would be nice if you'd provided those sorts of details earlier, instead
of us having to resort to interrogation to obtain them.

> I have an idea that
>    2033                 if (newtuple != tuple)  /* modified by Trigger(s) */
> sometime can return true even if no triggers really had been done.

No, you've got that backwards: if that test did succeed, the if-body
would guarantee that we have a materialized slot.  I think though that
I see what is going on:

1.  The slot passed to ExecUpdate is always going to be
estate->es_junkFilter->jf_resultSlot, as a consequence of the fact that
ExecutePlan will always have invoked ExecFilterJunk just above.

2. ExecUpdate forces the slot into the materialized state and sets its
local "tuple" variable to the address of the physical tuple contained in
the slot.

3. ExecBRUpdateTriggers runs and tries to lock the target tuple.
Occasionally, there is a concurrent update that's just committed,
in which case GetTupleForTrigger runs EvalPlanQual and gives back
a new tuple-to-be-stored in a different slot.  When that happens,
ExecBRUpdateTriggers cleans the tuple by applying ExecRemoveJunk
... using estate->es_junkFilter.

4. execJunk.c does ExecClearTuple on its result slot.  At this point,
ExecUpdate's local tuple variable is a dangling pointer, because we
just freed the tuple it is pointing at.

5. ExecFilterJunk's result is a virtual tuple, but ExecRemoveJunk
does ExecCopySlotTuple -> heap_form_tuple to produce a copied tuple
that isn't owned by the slot.  However, if you're unlucky and the phase
of the moon is just right, that copied tuple may get palloc'd in the
same place where we just pfree'd the tuple that had been owned by the
jf_resultSlot.

6. ExecBRUpdateTriggers falls through (not finding any triggers that
will actually fire) and returns the copied tuple to ExecUpdate.
Normally, ExecUpdate would stuff the copied tuple into
es_trig_tuple_slot and all would be well, because that slot would surely
be in materialized state.  However, if you were sufficiently unlucky
above, the "newtuple != tuple" test fails because of pointer equality
between the dangling local variable and the live copied tuple.

7. At this point ExecUpdate has a physical tuple that is what it is
supposed to store, so that's fine, and furthermore the slot contains
the right data values for all the user columns.  So the only visible
problem is that if you try to extract ctid or other system columns from
the slot, that will fail, because the physical tuple isn't owned by the
slot.  I think though that worse problems could ensue if there were
active triggers that modified the row --- if you were really *seriously*
unlucky, there could be a chance pointer match between the dangling
tuple variable and a modified tuple returned by a trigger.  In that case
the modified tuple would be what was stored, but the index entries
generated for it would be from the pre-trigger slot values.

Ugh.  That quick little "ExecRemoveJunk" is a lot more dangerous than it
looks.  I had actually looked at this before, but concluded it was OK
because I couldn't reproduce the problem with a trigger in place.
I guess I wasn't unlucky enough to have the chance pointer equality
occur.

The narrowest way to fix this would be to add a condition to the
"newtuple != tuple" test in ExecUpdate to verify that "tuple" still
matches what is stored in the slot.  That's pretty ugly though.

What would likely be better is to rearrange things so that
ExecBRUpdateTriggers doesn't change the es_junkFilter's state.  Or we
could have ExecBRUpdateTriggers store the tuple into es_trig_tuple_slot
for itself and return that slot, thus eliminating the need for a
pointer-equality-dependent test to see if something had been done.
Either way seems like a significantly more invasive patch, but it might
be less likely to break again in future.

            regards, tom lane

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Tom Lane

Дата:

21 февраля 2011 г., 20:07:53

I wrote:
> [ theory about cause of bug #5798 ]

Attached is a proposed patch against HEAD to fix this problem.  It's a
bit hard to *prove* that it fixes the problem, since there's no easy way
to force the collision of palloc addresses.  But I think it will.

The patch changes the signature of ExecBRUpdateTriggers so that it takes
and returns a TupleTableSlot instead of a HeapTuple.  This essentially
just moves the troublesome code (the "if (newtuple != tuple)" block)
from ExecUpdate to the bottom of ExecBRUpdateTriggers.  It solves the
problem because in the case where ExecBRUpdateTriggers clobbers the
contents of the junkfilter's output slot, it can immediately update the
tuple pointer that will later be compared to see if the trigger(s)
replaced the tuple.

I chose to change ExecBRInsertTriggers, ExecIRInsertTriggers, and
ExecIRUpdateTriggers to also take and return TupleTableSlots.   These
functions don't have a bug AFAICS, but it seems better that they all
retain similar APIs.

I'm not sure whether to back-patch all four changes or limit the patch
to changing ExecBRUpdateTriggers in the back branches.  Any thoughts
about that?

            regards, tom lane

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cac11a6c64107a2f977d2d7e5dc0d29fce5e35c7..44f568f396414f69485ab0538b0086741e5068c9 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** CopyFrom(CopyState cstate)
*** 1836,1842 ****
      ResultRelInfo *resultRelInfo;
      EState       *estate = CreateExecutorState(); /* for ExecConstraints() */
      ExprContext *econtext;
!     TupleTableSlot *slot;
      MemoryContext oldcontext = CurrentMemoryContext;
      ErrorContextCallback errcontext;
      CommandId    mycid = GetCurrentCommandId(true);
--- 1836,1842 ----
      ResultRelInfo *resultRelInfo;
      EState       *estate = CreateExecutorState(); /* for ExecConstraints() */
      ExprContext *econtext;
!     TupleTableSlot *myslot;
      MemoryContext oldcontext = CurrentMemoryContext;
      ErrorContextCallback errcontext;
      CommandId    mycid = GetCurrentCommandId(true);
*************** CopyFrom(CopyState cstate)
*** 1932,1939 ****
      estate->es_result_relation_info = resultRelInfo;

      /* Set up a tuple slot too */
!     slot = ExecInitExtraTupleSlot(estate);
!     ExecSetSlotDescriptor(slot, tupDesc);

      /* Prepare to catch AFTER triggers. */
      AfterTriggerBeginQuery();
--- 1932,1941 ----
      estate->es_result_relation_info = resultRelInfo;

      /* Set up a tuple slot too */
!     myslot = ExecInitExtraTupleSlot(estate);
!     ExecSetSlotDescriptor(myslot, tupDesc);
!     /* Triggers might need a slot as well */
!     estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

      /* Prepare to catch AFTER triggers. */
      AfterTriggerBeginQuery();
*************** CopyFrom(CopyState cstate)
*** 1960,1965 ****
--- 1962,1968 ----

      for (;;)
      {
+         TupleTableSlot *slot;
          bool        skip_tuple;
          Oid            loaded_oid = InvalidOid;

*************** CopyFrom(CopyState cstate)
*** 1983,2014 ****
          /* Triggers and stuff need to be invoked in query context. */
          MemoryContextSwitchTo(oldcontext);

          skip_tuple = false;

          /* BEFORE ROW INSERT Triggers */
          if (resultRelInfo->ri_TrigDesc &&
              resultRelInfo->ri_TrigDesc->trig_insert_before_row)
          {
!             HeapTuple    newtuple;
!
!             newtuple = ExecBRInsertTriggers(estate, resultRelInfo, tuple);

!             if (newtuple == NULL)        /* "do nothing" */
                  skip_tuple = true;
!             else if (newtuple != tuple) /* modified by Trigger(s) */
!             {
!                 heap_freetuple(tuple);
!                 tuple = newtuple;
!             }
          }

          if (!skip_tuple)
          {
              List       *recheckIndexes = NIL;

-             /* Place tuple in tuple slot */
-             ExecStoreTuple(tuple, slot, InvalidBuffer, false);
-
              /* Check the constraints of the tuple */
              if (cstate->rel->rd_att->constr)
                  ExecConstraints(resultRelInfo, slot, estate);
--- 1986,2013 ----
          /* Triggers and stuff need to be invoked in query context. */
          MemoryContextSwitchTo(oldcontext);

+         /* Place tuple in tuple slot --- but slot shouldn't free it */
+         slot = myslot;
+         ExecStoreTuple(tuple, slot, InvalidBuffer, false);
+
          skip_tuple = false;

          /* BEFORE ROW INSERT Triggers */
          if (resultRelInfo->ri_TrigDesc &&
              resultRelInfo->ri_TrigDesc->trig_insert_before_row)
          {
!             slot = ExecBRInsertTriggers(estate, resultRelInfo, slot);

!             if (slot == NULL)        /* "do nothing" */
                  skip_tuple = true;
!             else                    /* trigger might have changed tuple */
!                 tuple = ExecMaterializeSlot(slot);
          }

          if (!skip_tuple)
          {
              List       *recheckIndexes = NIL;

              /* Check the constraints of the tuple */
              if (cstate->rel->rd_att->constr)
                  ExecConstraints(resultRelInfo, slot, estate);
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 8d996a87c771d4210a97e449afdbe512420afa34..dc6ee9c266f2f9fc731a35d4ec18c23f283c477c 100644
*** a/src/backend/commands/trigger.c
--- b/src/backend/commands/trigger.c
*************** ExecASInsertTriggers(EState *estate, Res
*** 1909,1920 ****
                                false, NULL, NULL, NIL, NULL);
  }

! HeapTuple
  ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
!                      HeapTuple trigtuple)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
!     HeapTuple    newtuple = trigtuple;
      HeapTuple    oldtuple;
      TriggerData LocTriggerData;
      int            i;
--- 1909,1921 ----
                                false, NULL, NULL, NIL, NULL);
  }

! TupleTableSlot *
  ExecBRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
!                      TupleTableSlot *slot)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
!     HeapTuple    slottuple = ExecMaterializeSlot(slot);
!     HeapTuple    newtuple = slottuple;
      HeapTuple    oldtuple;
      TriggerData LocTriggerData;
      int            i;
*************** ExecBRInsertTriggers(EState *estate, Res
*** 1947,1958 ****
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (oldtuple != newtuple && oldtuple != trigtuple)
              heap_freetuple(oldtuple);
          if (newtuple == NULL)
!             break;
      }
!     return newtuple;
  }

  void
--- 1948,1976 ----
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (oldtuple != newtuple && oldtuple != slottuple)
              heap_freetuple(oldtuple);
          if (newtuple == NULL)
!             return NULL;        /* "do nothing" */
      }
!
!     if (newtuple != slottuple)
!     {
!         /*
!          * Return the modified tuple using the es_trig_tuple_slot.  We assume
!          * the tuple was allocated in per-tuple memory context, and therefore
!          * will go away by itself. The tuple table slot should not try to
!          * clear it.
!          */
!         TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!         TupleDesc    tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
!
!         if (newslot->tts_tupleDescriptor != tupdesc)
!             ExecSetSlotDescriptor(newslot, tupdesc);
!         ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!         slot = newslot;
!     }
!     return slot;
  }

  void
*************** ExecARInsertTriggers(EState *estate, Res
*** 1966,1977 ****
                                true, NULL, trigtuple, recheckIndexes, NULL);
  }

! HeapTuple
  ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
!                      HeapTuple trigtuple)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
!     HeapTuple    newtuple = trigtuple;
      HeapTuple    oldtuple;
      TriggerData LocTriggerData;
      int            i;
--- 1984,1996 ----
                                true, NULL, trigtuple, recheckIndexes, NULL);
  }

! TupleTableSlot *
  ExecIRInsertTriggers(EState *estate, ResultRelInfo *relinfo,
!                      TupleTableSlot *slot)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
!     HeapTuple    slottuple = ExecMaterializeSlot(slot);
!     HeapTuple    newtuple = slottuple;
      HeapTuple    oldtuple;
      TriggerData LocTriggerData;
      int            i;
*************** ExecIRInsertTriggers(EState *estate, Res
*** 2004,2015 ****
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (oldtuple != newtuple && oldtuple != trigtuple)
              heap_freetuple(oldtuple);
          if (newtuple == NULL)
!             break;
      }
!     return newtuple;
  }

  void
--- 2023,2051 ----
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (oldtuple != newtuple && oldtuple != slottuple)
              heap_freetuple(oldtuple);
          if (newtuple == NULL)
!             return NULL;        /* "do nothing" */
      }
!
!     if (newtuple != slottuple)
!     {
!         /*
!          * Return the modified tuple using the es_trig_tuple_slot.  We assume
!          * the tuple was allocated in per-tuple memory context, and therefore
!          * will go away by itself. The tuple table slot should not try to
!          * clear it.
!          */
!         TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!         TupleDesc    tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
!
!         if (newslot->tts_tupleDescriptor != tupdesc)
!             ExecSetSlotDescriptor(newslot, tupdesc);
!         ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!         slot = newslot;
!     }
!     return slot;
  }

  void
*************** ExecASUpdateTriggers(EState *estate, Res
*** 2257,2288 ****
                                GetModifiedColumns(relinfo, estate));
  }

! HeapTuple
  ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
                       ResultRelInfo *relinfo,
!                      ItemPointer tupleid, HeapTuple newtuple)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
      TriggerData LocTriggerData;
      HeapTuple    trigtuple;
      HeapTuple    oldtuple;
-     HeapTuple    intuple = newtuple;
      TupleTableSlot *newSlot;
      int            i;
      Bitmapset  *modifiedCols;

      trigtuple = GetTupleForTrigger(estate, epqstate, relinfo, tupleid,
                                     &newSlot);
      if (trigtuple == NULL)
!         return NULL;

      /*
!      * In READ COMMITTED isolation level it's possible that newtuple was
       * changed due to concurrent update.  In that case we have a raw subplan
!      * output tuple and need to run it through the junk filter.
       */
      if (newSlot != NULL)
!         intuple = newtuple = ExecRemoveJunk(relinfo->ri_junkFilter, newSlot);

      modifiedCols = GetModifiedColumns(relinfo, estate);

--- 2293,2336 ----
                                GetModifiedColumns(relinfo, estate));
  }

! TupleTableSlot *
  ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
                       ResultRelInfo *relinfo,
!                      ItemPointer tupleid, TupleTableSlot *slot)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+     HeapTuple    slottuple = ExecMaterializeSlot(slot);
+     HeapTuple    newtuple = slottuple;
      TriggerData LocTriggerData;
      HeapTuple    trigtuple;
      HeapTuple    oldtuple;
      TupleTableSlot *newSlot;
      int            i;
      Bitmapset  *modifiedCols;

+     /* get a copy of the on-disk tuple we are planning to update */
      trigtuple = GetTupleForTrigger(estate, epqstate, relinfo, tupleid,
                                     &newSlot);
      if (trigtuple == NULL)
!         return NULL;            /* cancel the update action */

      /*
!      * In READ COMMITTED isolation level it's possible that target tuple was
       * changed due to concurrent update.  In that case we have a raw subplan
!      * output tuple in newSlot, and need to run it through the junk filter to
!      * produce an insertable tuple.
!      *
!      * Caution: more than likely, the passed-in slot is the same as the
!      * junkfilter's output slot, so we are clobbering the original value of
!      * slottuple by doing the filtering.  This is OK since neither we nor our
!      * caller have any more interest in the prior contents of that slot.
       */
      if (newSlot != NULL)
!     {
!         slot = ExecFilterJunk(relinfo->ri_junkFilter, newSlot);
!         slottuple = ExecMaterializeSlot(slot);
!         newtuple = slottuple;
!     }

      modifiedCols = GetModifiedColumns(relinfo, estate);

*************** ExecBRUpdateTriggers(EState *estate, EPQ
*** 2314,2326 ****
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (oldtuple != newtuple && oldtuple != intuple)
              heap_freetuple(oldtuple);
          if (newtuple == NULL)
!             break;
      }
      heap_freetuple(trigtuple);
!     return newtuple;
  }

  void
--- 2362,2394 ----
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (oldtuple != newtuple && oldtuple != slottuple)
              heap_freetuple(oldtuple);
          if (newtuple == NULL)
!         {
!             heap_freetuple(trigtuple);
!             return NULL;        /* "do nothing" */
!         }
      }
      heap_freetuple(trigtuple);
!
!     if (newtuple != slottuple)
!     {
!         /*
!          * Return the modified tuple using the es_trig_tuple_slot.  We assume
!          * the tuple was allocated in per-tuple memory context, and therefore
!          * will go away by itself. The tuple table slot should not try to
!          * clear it.
!          */
!         TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!         TupleDesc    tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
!
!         if (newslot->tts_tupleDescriptor != tupdesc)
!             ExecSetSlotDescriptor(newslot, tupdesc);
!         ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!         slot = newslot;
!     }
!     return slot;
  }

  void
*************** ExecARUpdateTriggers(EState *estate, Res
*** 2342,2355 ****
      }
  }

! HeapTuple
  ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
!                      HeapTuple oldtuple, HeapTuple newtuple)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
      TriggerData LocTriggerData;
!     HeapTuple    intuple = newtuple;
!     HeapTuple    rettuple;
      int            i;

      LocTriggerData.type = T_TriggerData;
--- 2410,2424 ----
      }
  }

! TupleTableSlot *
  ExecIRUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
!                      HeapTuple trigtuple, TupleTableSlot *slot)
  {
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
+     HeapTuple    slottuple = ExecMaterializeSlot(slot);
+     HeapTuple    newtuple = slottuple;
      TriggerData LocTriggerData;
!     HeapTuple    oldtuple;
      int            i;

      LocTriggerData.type = T_TriggerData;
*************** ExecIRUpdateTriggers(EState *estate, Res
*** 2367,2392 ****
                                    TRIGGER_TYPE_UPDATE))
              continue;
          if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
!                             NULL, oldtuple, newtuple))
              continue;

!         LocTriggerData.tg_trigtuple = oldtuple;
!         LocTriggerData.tg_newtuple = newtuple;
          LocTriggerData.tg_trigtuplebuf = InvalidBuffer;
          LocTriggerData.tg_newtuplebuf = InvalidBuffer;
          LocTriggerData.tg_trigger = trigger;
!         rettuple = ExecCallTriggerFunc(&LocTriggerData,
                                         i,
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (newtuple != rettuple && newtuple != intuple)
!             heap_freetuple(newtuple);
!         newtuple = rettuple;
          if (newtuple == NULL)
!             break;
      }
!     return newtuple;
  }

  void
--- 2436,2477 ----
                                    TRIGGER_TYPE_UPDATE))
              continue;
          if (!TriggerEnabled(estate, relinfo, trigger, LocTriggerData.tg_event,
!                             NULL, trigtuple, newtuple))
              continue;

!         LocTriggerData.tg_trigtuple = trigtuple;
!         LocTriggerData.tg_newtuple = oldtuple = newtuple;
          LocTriggerData.tg_trigtuplebuf = InvalidBuffer;
          LocTriggerData.tg_newtuplebuf = InvalidBuffer;
          LocTriggerData.tg_trigger = trigger;
!         newtuple = ExecCallTriggerFunc(&LocTriggerData,
                                         i,
                                         relinfo->ri_TrigFunctions,
                                         relinfo->ri_TrigInstrument,
                                         GetPerTupleMemoryContext(estate));
!         if (oldtuple != newtuple && oldtuple != slottuple)
!             heap_freetuple(oldtuple);
          if (newtuple == NULL)
!             return NULL;        /* "do nothing" */
      }
!
!     if (newtuple != slottuple)
!     {
!         /*
!          * Return the modified tuple using the es_trig_tuple_slot.  We assume
!          * the tuple was allocated in per-tuple memory context, and therefore
!          * will go away by itself. The tuple table slot should not try to
!          * clear it.
!          */
!         TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!         TupleDesc    tupdesc = RelationGetDescr(relinfo->ri_RelationDesc);
!
!         if (newslot->tts_tupleDescriptor != tupdesc)
!             ExecSetSlotDescriptor(newslot, tupdesc);
!         ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!         slot = newslot;
!     }
!     return slot;
  }

  void
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 42662bdc461edfd2f9313795865577270fd33736..12a5b2a8953e19ade74f9a1c4f668532096b76c9 100644
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
*************** ExecInsert(TupleTableSlot *slot,
*** 199,258 ****
      if (resultRelInfo->ri_TrigDesc &&
          resultRelInfo->ri_TrigDesc->trig_insert_before_row)
      {
!         HeapTuple    newtuple;
!
!         newtuple = ExecBRInsertTriggers(estate, resultRelInfo, tuple);

!         if (newtuple == NULL)    /* "do nothing" */
              return NULL;

!         if (newtuple != tuple)    /* modified by Trigger(s) */
!         {
!             /*
!              * Put the modified tuple into a slot for convenience of routines
!              * below.  We assume the tuple was allocated in per-tuple memory
!              * context, and therefore will go away by itself. The tuple table
!              * slot should not try to clear it.
!              */
!             TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!             TupleDesc    tupdesc = RelationGetDescr(resultRelationDesc);
!
!             if (newslot->tts_tupleDescriptor != tupdesc)
!                 ExecSetSlotDescriptor(newslot, tupdesc);
!             ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!             slot = newslot;
!             tuple = newtuple;
!         }
      }

      /* INSTEAD OF ROW INSERT Triggers */
      if (resultRelInfo->ri_TrigDesc &&
          resultRelInfo->ri_TrigDesc->trig_insert_instead_row)
      {
!         HeapTuple    newtuple;
!
!         newtuple = ExecIRInsertTriggers(estate, resultRelInfo, tuple);

!         if (newtuple == NULL)    /* "do nothing" */
              return NULL;

!         if (newtuple != tuple)    /* modified by Trigger(s) */
!         {
!             /*
!              * Put the modified tuple into a slot for convenience of routines
!              * below.  We assume the tuple was allocated in per-tuple memory
!              * context, and therefore will go away by itself. The tuple table
!              * slot should not try to clear it.
!              */
!             TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!             TupleDesc    tupdesc = RelationGetDescr(resultRelationDesc);
!
!             if (newslot->tts_tupleDescriptor != tupdesc)
!                 ExecSetSlotDescriptor(newslot, tupdesc);
!             ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!             slot = newslot;
!             tuple = newtuple;
!         }

          newId = InvalidOid;
      }
--- 199,224 ----
      if (resultRelInfo->ri_TrigDesc &&
          resultRelInfo->ri_TrigDesc->trig_insert_before_row)
      {
!         slot = ExecBRInsertTriggers(estate, resultRelInfo, slot);

!         if (slot == NULL)        /* "do nothing" */
              return NULL;

!         /* trigger might have changed tuple */
!         tuple = ExecMaterializeSlot(slot);
      }

      /* INSTEAD OF ROW INSERT Triggers */
      if (resultRelInfo->ri_TrigDesc &&
          resultRelInfo->ri_TrigDesc->trig_insert_instead_row)
      {
!         slot = ExecIRInsertTriggers(estate, resultRelInfo, slot);

!         if (slot == NULL)        /* "do nothing" */
              return NULL;

!         /* trigger might have changed tuple */
!         tuple = ExecMaterializeSlot(slot);

          newId = InvalidOid;
      }
*************** ExecUpdate(ItemPointer tupleid,
*** 533,563 ****
      if (resultRelInfo->ri_TrigDesc &&
          resultRelInfo->ri_TrigDesc->trig_update_before_row)
      {
!         HeapTuple    newtuple;
!
!         newtuple = ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
!                                         tupleid, tuple);

!         if (newtuple == NULL)    /* "do nothing" */
              return NULL;

!         if (newtuple != tuple)    /* modified by Trigger(s) */
!         {
!             /*
!              * Put the modified tuple into a slot for convenience of routines
!              * below.  We assume the tuple was allocated in per-tuple memory
!              * context, and therefore will go away by itself. The tuple table
!              * slot should not try to clear it.
!              */
!             TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!             TupleDesc    tupdesc = RelationGetDescr(resultRelationDesc);
!
!             if (newslot->tts_tupleDescriptor != tupdesc)
!                 ExecSetSlotDescriptor(newslot, tupdesc);
!             ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!             slot = newslot;
!             tuple = newtuple;
!         }
      }

      /* INSTEAD OF ROW UPDATE Triggers */
--- 499,512 ----
      if (resultRelInfo->ri_TrigDesc &&
          resultRelInfo->ri_TrigDesc->trig_update_before_row)
      {
!         slot = ExecBRUpdateTriggers(estate, epqstate, resultRelInfo,
!                                     tupleid, slot);

!         if (slot == NULL)        /* "do nothing" */
              return NULL;

!         /* trigger might have changed tuple */
!         tuple = ExecMaterializeSlot(slot);
      }

      /* INSTEAD OF ROW UPDATE Triggers */
*************** ExecUpdate(ItemPointer tupleid,
*** 565,571 ****
          resultRelInfo->ri_TrigDesc->trig_update_instead_row)
      {
          HeapTupleData oldtup;
-         HeapTuple    newtuple;

          Assert(oldtuple != NULL);
          oldtup.t_data = oldtuple;
--- 514,519 ----
*************** ExecUpdate(ItemPointer tupleid,
*** 573,601 ****
          ItemPointerSetInvalid(&(oldtup.t_self));
          oldtup.t_tableOid = InvalidOid;

!         newtuple = ExecIRUpdateTriggers(estate, resultRelInfo,
!                                         &oldtup, tuple);

!         if (newtuple == NULL)    /* "do nothing" */
              return NULL;

!         if (newtuple != tuple)    /* modified by Trigger(s) */
!         {
!             /*
!              * Put the modified tuple into a slot for convenience of routines
!              * below.  We assume the tuple was allocated in per-tuple memory
!              * context, and therefore will go away by itself. The tuple table
!              * slot should not try to clear it.
!              */
!             TupleTableSlot *newslot = estate->es_trig_tuple_slot;
!             TupleDesc    tupdesc = RelationGetDescr(resultRelationDesc);
!
!             if (newslot->tts_tupleDescriptor != tupdesc)
!                 ExecSetSlotDescriptor(newslot, tupdesc);
!             ExecStoreTuple(newtuple, newslot, InvalidBuffer, false);
!             slot = newslot;
!             tuple = newtuple;
!         }
      }
      else
      {
--- 521,534 ----
          ItemPointerSetInvalid(&(oldtup.t_self));
          oldtup.t_tableOid = InvalidOid;

!         slot = ExecIRUpdateTriggers(estate, resultRelInfo,
!                                     &oldtup, slot);

!         if (slot == NULL)        /* "do nothing" */
              return NULL;

!         /* trigger might have changed tuple */
!         tuple = ExecMaterializeSlot(slot);
      }
      else
      {
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index c213ac7a4efa32ab79e80560a4a040fc4e666636..80a779ed0bdf143f058249843bebdf665925abe6 100644
*** a/src/include/commands/trigger.h
--- b/src/include/commands/trigger.h
*************** extern void ExecBSInsertTriggers(EState
*** 132,147 ****
                       ResultRelInfo *relinfo);
  extern void ExecASInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo);
! extern HeapTuple ExecBRInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo,
!                      HeapTuple trigtuple);
  extern void ExecARInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo,
                       HeapTuple trigtuple,
                       List *recheckIndexes);
! extern HeapTuple ExecIRInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo,
!                      HeapTuple trigtuple);
  extern void ExecBSDeleteTriggers(EState *estate,
                       ResultRelInfo *relinfo);
  extern void ExecASDeleteTriggers(EState *estate,
--- 132,147 ----
                       ResultRelInfo *relinfo);
  extern void ExecASInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo);
! extern TupleTableSlot *ExecBRInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo,
!                      TupleTableSlot *slot);
  extern void ExecARInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo,
                       HeapTuple trigtuple,
                       List *recheckIndexes);
! extern TupleTableSlot *ExecIRInsertTriggers(EState *estate,
                       ResultRelInfo *relinfo,
!                      TupleTableSlot *slot);
  extern void ExecBSDeleteTriggers(EState *estate,
                       ResultRelInfo *relinfo);
  extern void ExecASDeleteTriggers(EState *estate,
*************** extern void ExecBSUpdateTriggers(EState
*** 160,179 ****
                       ResultRelInfo *relinfo);
  extern void ExecASUpdateTriggers(EState *estate,
                       ResultRelInfo *relinfo);
! extern HeapTuple ExecBRUpdateTriggers(EState *estate,
                       EPQState *epqstate,
                       ResultRelInfo *relinfo,
                       ItemPointer tupleid,
!                      HeapTuple newtuple);
  extern void ExecARUpdateTriggers(EState *estate,
                       ResultRelInfo *relinfo,
                       ItemPointer tupleid,
                       HeapTuple newtuple,
                       List *recheckIndexes);
! extern HeapTuple ExecIRUpdateTriggers(EState *estate,
                       ResultRelInfo *relinfo,
!                      HeapTuple oldtuple,
!                      HeapTuple newtuple);
  extern void ExecBSTruncateTriggers(EState *estate,
                         ResultRelInfo *relinfo);
  extern void ExecASTruncateTriggers(EState *estate,
--- 160,179 ----
                       ResultRelInfo *relinfo);
  extern void ExecASUpdateTriggers(EState *estate,
                       ResultRelInfo *relinfo);
! extern TupleTableSlot *ExecBRUpdateTriggers(EState *estate,
                       EPQState *epqstate,
                       ResultRelInfo *relinfo,
                       ItemPointer tupleid,
!                      TupleTableSlot *slot);
  extern void ExecARUpdateTriggers(EState *estate,
                       ResultRelInfo *relinfo,
                       ItemPointer tupleid,
                       HeapTuple newtuple,
                       List *recheckIndexes);
! extern TupleTableSlot *ExecIRUpdateTriggers(EState *estate,
                       ResultRelInfo *relinfo,
!                      HeapTuple trigtuple,
!                      TupleTableSlot *slot);
  extern void ExecBSTruncateTriggers(EState *estate,
                         ResultRelInfo *relinfo);
  extern void ExecASTruncateTriggers(EState *estate,

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Tom Lane

Дата:

21 февраля 2011 г., 23:31:09

I wrote:
> Ugh.  That quick little "ExecRemoveJunk" is a lot more dangerous than it
> looks.  I had actually looked at this before, but concluded it was OK
> because I couldn't reproduce the problem with a trigger in place.
> I guess I wasn't unlucky enough to have the chance pointer equality
> occur.

On closer inspection, the palloc collision is actually extremely likely,
because what will happen is we'll pfree the old tuple and immediately
palloc the new one, and if the new one is of sufficiently similar size
that it needs the same size of alloc chunk, it's *guaranteed* to get
that same chunk back, because of the LIFO free-chunk chains in aset.c.

The reason that the problem is hard to reproduce is that most triggers
(certainly those written in plpgsql) will give back newly allocated
tuples even when you return the unmodified NEW tuple.  The only way to
expose the problem is for ExecBRUpdateTrigger's trigger-calling loop to
not replace the "newtuple", and the easiest way for that to happen is if
all the triggers are disabled.  So that's why you're seeing it when
fooling with the replication-role setting.  I was able to reproduce the
failure by creating a trigger with a false WHEN condition, and of course
there are other ways to prevent a trigger from being called too.

            regards, tom lane

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Maxim Boguk

Дата:

22 февраля 2011 г., 05:43:58

On Tue, Feb 22, 2011 at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I wrote:
> > Ugh.  That quick little "ExecRemoveJunk" is a lot more dangerous than it
> > looks.  I had actually looked at this before, but concluded it was OK
> > because I couldn't reproduce the problem with a trigger in place.
> > I guess I wasn't unlucky enough to have the chance pointer equality
> > occur.
>
> On closer inspection, the palloc collision is actually extremely likely,
> because what will happen is we'll pfree the old tuple and immediately
> palloc the new one, and if the new one is of sufficiently similar size
> that it needs the same size of alloc chunk, it's *guaranteed* to get
> that same chunk back, because of the LIFO free-chunk chains in aset.c.
>
> The reason that the problem is hard to reproduce is that most triggers
> (certainly those written in plpgsql) will give back newly allocated
> tuples even when you return the unmodified NEW tuple.  The only way to
> expose the problem is for ExecBRUpdateTrigger's trigger-calling loop to
> not replace the "newtuple", and the easiest way for that to happen is if
> all the triggers are disabled.  So that's why you're seeing it when
> fooling with the replication-role setting.  I was able to reproduce the
> failure by creating a trigger with a false WHEN condition, and of course
> there are other ways to prevent a trigger from being called too.
>
>                        regards, tom lane
>


Thank you very much for fixing that issue.

I apologize for not providing critical details at start of the discussion
(until yesterday they seems unimportant for me), I will try provide all
possible details in future if I find anything else like that.

Now for me start digging next strange problem described here:
http://archives.postgresql.org/pgsql-admin/2011-01/msg00055.php

--
Maxim Boguk
Senior Postgresql DBA.

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Merlin Moncure

Дата:

14 июня 2011 г., 14:39:35

On Mon, Feb 21, 2011 at 6:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Ugh. =A0That quick little "ExecRemoveJunk" is a lot more dangerous than =
it
>> looks. =A0I had actually looked at this before, but concluded it was OK
>> because I couldn't reproduce the problem with a trigger in place.
>> I guess I wasn't unlucky enough to have the chance pointer equality
>> occur.
>
> On closer inspection, the palloc collision is actually extremely likely,
> because what will happen is we'll pfree the old tuple and immediately
> palloc the new one, and if the new one is of sufficiently similar size
> that it needs the same size of alloc chunk, it's *guaranteed* to get
> that same chunk back, because of the LIFO free-chunk chains in aset.c.
>
> The reason that the problem is hard to reproduce is that most triggers
> (certainly those written in plpgsql) will give back newly allocated
> tuples even when you return the unmodified NEW tuple. =A0The only way to
> expose the problem is for ExecBRUpdateTrigger's trigger-calling loop to
> not replace the "newtuple", and the easiest way for that to happen is if
> all the triggers are disabled. =A0So that's why you're seeing it when
> fooling with the replication-role setting. =A0I was able to reproduce the
> failure by creating a trigger with a false WHEN condition, and of course
> there are other ways to prevent a trigger from being called too.

I bumped into the 'virtual tuple' error on 9.0.3 on a completely
unrelated query:

create table foo(a int, b int);
insert into foo select 1, 1 where not exists (select 1 from foo where
a =3D 1 for update);

9.0.4 got rid of the error, but as there are no triggers or anything
like that on the foo so I thought I'd post my findings here in case
anyone else bumped into it.

merlin

Re: BUG #5798: Some weird error with pl/pgsql procedure

От

Tom Lane

Дата:

14 июня 2011 г., 14:55:52

Merlin Moncure <mmoncure@gmail.com> writes:
> I bumped into the 'virtual tuple' error on 9.0.3 on a completely
> unrelated query:

> create table foo(a int, b int);
> insert into foo select 1, 1 where not exists (select 1 from foo where
> a = 1 for update);

> 9.0.4 got rid of the error, but as there are no triggers or anything
> like that on the foo so I thought I'd post my findings here in case
> anyone else bumped into it.

I think most likely this has to do with commit
d5478c3391f8f1a243abbc3d9253aac3d6d3538e, and is unrelated (except
in the error message) to that trigger issue.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #5798: Some weird error with pl/pgsql procedure