Обсуждение: Transaction ID overrun problem on greenplum

Поиск
Список
Период
Сортировка

Transaction ID overrun problem on greenplum

От
"Little, Douglas"
Дата:

Hello,

Apologies in advance for asking here and not on the greenplum forum, but  I’m asking here because I think the PG community understands pg better than Greenplum.

 

We run Greenplum 4.1 which is based on PG 8.2.15

We recently had a problem where the transaction ID overflowed an internal buffer,  causing queries to  hang and randomly fail. 

Greenplum provided a fix where they increased the size of the buffer.

Greenplum has disabled the Autovacuum features. I suspect because they have no method of coordinating the autovacuums across the nodes.

 

I’ve reviewed the PG doc on transaction id and

I’m trying to understand if there’s something we are doing wrong.

Practices concerning me

1.        We never vacuum the entire db – we’re doing the system catalog tables 3x/day and user tables at least 1x/w- although we don’t have a method of identifying tables that have not been vacuumed so it’s possible we’re missing some table.

2.       In the pg_clog directories we have 25 files dating back to aug 2011.  When the instances start, the logs are filled with twophase location message.  Are these expected?

3.       Is the overflow really being caused by transaction wrap around?

 

Would someone comment on these concerns.

Thanks

 

 

Guk setting

vacuum_freeze_min_age            100,000,000        Minimum age at which VACUUM should freeze a table row.

 

 

Log failure message

 

2012-05-10 16:26:24.425750 CDT,"cognos_cube","p1gp1",p25497,th2121975200,"172.28.8.250","18959",2012-05-10 16:25:03 CDT,100000245,con2170,,seg24,,,x100000245,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 99999948",,,,,,,0,,"twophase.c",1368,

2012-05-10 16:26:24.739536 CDT,"infa_mktldr","p1gp1",p27945,th2121975200,"172.28.8.250","27576",2012-05-10 16:26:15 CDT,100000246,con2240,,seg24,,,x100000246,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 100000230",,,,,,,0,,"twophase.c",1368,

2012-05-10 16:26:33.859255 CDT,"infa_read","p1gp1",p27981,th2121975200,"172.28.8.250","27866",2012-05-10 16:26:18 CDT,100000256,con2244,,seg24,,,x100000256,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 100000236",,,,,,,0,,"twophase.c",1368,

2012-05-10 16:26:38.564872 CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10 16:26:06 CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"LOG","00000","

Unexpected internal error: Segment process 27416 received signal SIGSEGV

 

",,,,,,,0,,,,

2012-05-10 16:26:41.359615 CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10 16:26:06 CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"PANIC","XX000","Unexpected internal error: Segment process received signal SIGSEGV (postgres.c:3384)",,,,,,,0,,"postgres.c",3384,"Stack trace:

1    0xa39555 postgres errstart (elog.c:454)

2    0x8e26e5 postgres <symbol not found> (postgres.c:3380)

3    0x38c70302d0 libc.so.6 <symbol not found> (??:0)

4    0x8b1578 postgres ProcArrayAdd (procarray.c:149)

5    0x5258a5 postgres EndPrepare (twophase.c:1240)

6    0x4f3462 postgres <symbol not found> (xact.c:3094)

7    0x4f3a75 postgres CommitTransactionCommand (xact.c:3523)

8    0xb944a5 postgres performDtxProtocolCommand (cdbtm.c:3891)

9    0x8e6552 postgres PostgresMain (postgres.c:1414)

10   0x8516b1 postgres <symbol not found> (postmaster.c:6443)

11   0x8589a5 postgres PostmasterMain (postmaster.c:2272)

12   0x76713a postgres main (main.c:212)

13   0x38c701d994 libc.so.6 __libc_start_main (??:0)

14   0x475089 postgres <symbol not found> (??:0)

"

2012-05-10 16:26:46.580196 CDT,,,p25872,th2121975200,,,,0,,,seg-1,,,,,"LOG","00000","server process (PID 27416) was terminated by signal 6: Aborted",,,,,,,0,,"postmaster.c",5275,

 

Log messages regarding twophase

2012-05-18 00:05:49.849381 CDT,"infa_write","p1gp1",p15767,th-1760766560,"172.28.8.250","33445",2012-05-18 00:05:29 CDT,104342361,con385826,,seg0,,,x104342361,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342242",,,,,,,0,,"twophase.c",1368,

2012-05-18 00:05:50.018299 CDT,"infa_write","p1gp1",p15887,th-1760766560,"172.28.8.250","34675",2012-05-18 00:05:32 CDT,104342362,con385833,,seg0,,,x104342362,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342257",,,,,,,0,,"twophase.c",1368,

2012-05-18 00:05:50.029302 CDT,"infa_write","p1gp1",p15839,th-1760766560,"172.28.8.250","34335",2012-05-18 00:05:30 CDT,104342363,con385830,,seg0,,,x104342363,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342250",,,,,,,0,,"twophase.c",1368,

2012-05-18 00:05:52.274591 CDT,"infa_write","p1gp1",p15959,th-1760766560,"172.28.8.250","35303",2012-05-18 00:05:36 CDT,104342369,con385840,,seg0,,,x104342369,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342277",,,,,,,0,,"twophase.c",1368,

2012-05-18 00:05:52.277127 CDT,"infa_write","p1gp1",p15478,th-1760766560,"172.28.8.250","31184",2012-05-18 00:05:24 CDT,104342370,con385813,,seg0,,,x104342370,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342346",,,,,,,0,,"twophase.c",1368,

2012-05-18 00:05:52.280285 CDT,"infa_write","p1gp1",p15535,th-1760766560,"172.28.8.250","31304",2012-05-18 00:05:25 CDT,104342371,con385816,,seg0,,,x104342371,sx1,"LOG","00000","FinishPreparedTransaction: found TwoPhaseState entry for 104342347",,,,,,,0,,"twophase.c",1368,

 

Thanks

Doug Little

 

 

Re: Transaction ID overrun problem on greenplum

От
Scott Marlowe
Дата:
Can you run db wide vacuums?  That's what I'd try first.

On Fri, May 18, 2012 at 10:41 AM, Little, Douglas
<DOUGLAS.LITTLE@orbitz.com> wrote:
> Hello,
>
> Apologies in advance for asking here and not on the greenplum forum, but
> I’m asking here because I think the PG community understands pg better than
> Greenplum.
>
>
>
> We run Greenplum 4.1 which is based on PG 8.2.15
>
> We recently had a problem where the transaction ID overflowed an internal
> buffer,  causing queries to  hang and randomly fail.
>
> Greenplum provided a fix where they increased the size of the buffer.
>
> Greenplum has disabled the Autovacuum features. I suspect because they have
> no method of coordinating the autovacuums across the nodes.
>
>
>
> I’ve reviewed the PG doc on transaction id and
>
> I’m trying to understand if there’s something we are doing wrong.
>
> Practices concerning me
>
> 1.        We never vacuum the entire db – we’re doing the system catalog
> tables 3x/day and user tables at least 1x/w- although we don’t have a method
> of identifying tables that have not been vacuumed so it’s possible we’re
> missing some table.
>
> 2.       In the pg_clog directories we have 25 files dating back to aug
> 2011.  When the instances start, the logs are filled with twophase location
> message.  Are these expected?
>
> 3.       Is the overflow really being caused by transaction wrap around?
>
>
>
> Would someone comment on these concerns.
>
> Thanks
>
>
>
>
>
> Guk setting
>
> vacuum_freeze_min_age            100,000,000        Minimum age at which
> VACUUM should freeze a table row.
>
>
>
>
>
> Log failure message
>
>
>
> 2012-05-10 16:26:24.425750
> CDT,"cognos_cube","p1gp1",p25497,th2121975200,"172.28.8.250","18959",2012-05-10
> 16:25:03
> CDT,100000245,con2170,,seg24,,,x100000245,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 99999948",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-10 16:26:24.739536
> CDT,"infa_mktldr","p1gp1",p27945,th2121975200,"172.28.8.250","27576",2012-05-10
> 16:26:15
> CDT,100000246,con2240,,seg24,,,x100000246,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 100000230",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-10 16:26:33.859255
> CDT,"infa_read","p1gp1",p27981,th2121975200,"172.28.8.250","27866",2012-05-10
> 16:26:18
> CDT,100000256,con2244,,seg24,,,x100000256,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 100000236",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-10 16:26:38.564872
> CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10
> 16:26:06
> CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"LOG","00000","
>
> Unexpected internal error: Segment process 27416 received signal SIGSEGV
>
>
>
> ",,,,,,,0,,,,
>
> 2012-05-10 16:26:41.359615
> CDT,"ods","p1gp1",p27416,th2121975200,"172.28.8.250","23337",2012-05-10
> 16:26:06
> CDT,100000257,con2212,,seg24,,dx4860103,x100000257,sx1,"PANIC","XX000","Unexpected
> internal error: Segment process received signal SIGSEGV
> (postgres.c:3384)",,,,,,,0,,"postgres.c",3384,"Stack trace:
>
> 1    0xa39555 postgres errstart (elog.c:454)
>
> 2    0x8e26e5 postgres <symbol not found> (postgres.c:3380)
>
> 3    0x38c70302d0 libc.so.6 <symbol not found> (??:0)
>
> 4    0x8b1578 postgres ProcArrayAdd (procarray.c:149)
>
> 5    0x5258a5 postgres EndPrepare (twophase.c:1240)
>
> 6    0x4f3462 postgres <symbol not found> (xact.c:3094)
>
> 7    0x4f3a75 postgres CommitTransactionCommand (xact.c:3523)
>
> 8    0xb944a5 postgres performDtxProtocolCommand (cdbtm.c:3891)
>
> 9    0x8e6552 postgres PostgresMain (postgres.c:1414)
>
> 10   0x8516b1 postgres <symbol not found> (postmaster.c:6443)
>
> 11   0x8589a5 postgres PostmasterMain (postmaster.c:2272)
>
> 12   0x76713a postgres main (main.c:212)
>
> 13   0x38c701d994 libc.so.6 __libc_start_main (??:0)
>
> 14   0x475089 postgres <symbol not found> (??:0)
>
> "
>
> 2012-05-10 16:26:46.580196
> CDT,,,p25872,th2121975200,,,,0,,,seg-1,,,,,"LOG","00000","server process
> (PID 27416) was terminated by signal 6:
> Aborted",,,,,,,0,,"postmaster.c",5275,
>
>
>
> Log messages regarding twophase
>
> 2012-05-18 00:05:49.849381
> CDT,"infa_write","p1gp1",p15767,th-1760766560,"172.28.8.250","33445",2012-05-18
> 00:05:29
> CDT,104342361,con385826,,seg0,,,x104342361,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342242",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:50.018299
> CDT,"infa_write","p1gp1",p15887,th-1760766560,"172.28.8.250","34675",2012-05-18
> 00:05:32
> CDT,104342362,con385833,,seg0,,,x104342362,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342257",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:50.029302
> CDT,"infa_write","p1gp1",p15839,th-1760766560,"172.28.8.250","34335",2012-05-18
> 00:05:30
> CDT,104342363,con385830,,seg0,,,x104342363,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342250",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:52.274591
> CDT,"infa_write","p1gp1",p15959,th-1760766560,"172.28.8.250","35303",2012-05-18
> 00:05:36
> CDT,104342369,con385840,,seg0,,,x104342369,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342277",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:52.277127
> CDT,"infa_write","p1gp1",p15478,th-1760766560,"172.28.8.250","31184",2012-05-18
> 00:05:24
> CDT,104342370,con385813,,seg0,,,x104342370,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342346",,,,,,,0,,"twophase.c",1368,
>
> 2012-05-18 00:05:52.280285
> CDT,"infa_write","p1gp1",p15535,th-1760766560,"172.28.8.250","31304",2012-05-18
> 00:05:25
> CDT,104342371,con385816,,seg0,,,x104342371,sx1,"LOG","00000","FinishPreparedTransaction:
> found TwoPhaseState entry for 104342347",,,,,,,0,,"twophase.c",1368,
>
>
>
> Thanks
>
> Doug Little
>
>
>
>



--
To understand recursion, one must first understand recursion.