Обсуждение: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
"ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
От
Alvaro Herrera
Дата:
A customer of ours recently hit a problem where after an autovacuum was cancelled on a table, the app started getting the message in $subject: ERROR: could not read block 6 of relation 1663/35078/1761966: read only 0 of 8192 bytes (block numbers vary from 1 to 6). Things remained in this state until another autovacuum came along and cleaned up the table, 4 minutes later (this is a high traffic table; there are several inserts per second). The log looks like this: 2009-10-20 04:02:07 PDT [27396]: [1-1] LOG: automatic vacuum of table "database.public.tabname": index scans: 1 pages:6 removed, 1 remain tuples: 755 removed, 2 remain system usage: CPU 0.00s/0.00u sec elapsed 1.42 sec 2009-10-20 04:02:07 PDT [27396]: [2-1] ERROR: canceling autovacuum task 2009-10-20 04:02:07 PDT [27396]: [3-1] CONTEXT: automatic vacuum of table "database.public.tabname" What I thought could have happened is that the table was truncated, and then the sinval message telling that to other backends was not sent due to the rollback. When they tried to insert to the page they had recorded as rd_targblock, they try to read the page but it's no longer there. I can reproduce this by adding a sleep and CHECK_FOR_INTERRUPTS after lazy_vacuum_rel() returns, and before CommitTransactionCommand. So far as I can see, what we need is to make sure the sinval message is sent regardless of transaction commit/abort. How can that be done? It is quite ugly to have an untimely autovacuum cancel disrupt the ability to insert into a table. Thoughts? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > What I thought could have happened is that the table was truncated, and > then the sinval message telling that to other backends was not sent due > to the rollback. Hmm. > So far as I can see, what we need is to make sure the sinval message is > sent regardless of transaction commit/abort. How can that be done? I would argue that once we've truncated, it's too late to abort. The interrupt facility should be disabled from just before issuing the truncate till after commit. It would probably be relatively painless to do that with some manipulation of the interrupt holdoff stuff. regards, tom lane
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
От
Alvaro Herrera
Дата:
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > So far as I can see, what we need is to make sure the sinval message is > > sent regardless of transaction commit/abort. How can that be done? > > I would argue that once we've truncated, it's too late to abort. The > interrupt facility should be disabled from just before issuing the > truncate till after commit. It would probably be relatively painless to > do that with some manipulation of the interrupt holdoff stuff. That cures my (admittedly simplistic) testcase. The patch is a bit ugly because the interrupts are held off in lazy_vacuum_rel and need to be released by its caller. I don't see any other way around the problem though. The attached patch is for 8.4; back branches all need a bit of editing. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Вложения
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I would argue that once we've truncated, it's too late to abort. The >> interrupt facility should be disabled from just before issuing the >> truncate till after commit. It would probably be relatively painless to >> do that with some manipulation of the interrupt holdoff stuff. > That cures my (admittedly simplistic) testcase. The patch is a bit ugly > because the interrupts are held off in lazy_vacuum_rel and need to be > released by its caller. I don't see any other way around the problem > though. I wonder whether we shouldn't extend this into VACUUM FULL too, to prevent cancel once it's done that internal commit. It would fix the "PANIC: can't abort a committed transaction" problem V.F. has. regards, tom lane
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
От
Alvaro Herrera
Дата:
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> I would argue that once we've truncated, it's too late to abort. The > >> interrupt facility should be disabled from just before issuing the > >> truncate till after commit. It would probably be relatively painless to > >> do that with some manipulation of the interrupt holdoff stuff. > > > That cures my (admittedly simplistic) testcase. The patch is a bit ugly > > because the interrupts are held off in lazy_vacuum_rel and need to be > > released by its caller. I don't see any other way around the problem > > though. > > I wonder whether we shouldn't extend this into VACUUM FULL too, to > prevent cancel once it's done that internal commit. It would fix > the "PANIC: can't abort a committed transaction" problem V.F. has. Hmm, it seems to work. The attached is for 8.1. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Вложения
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I wonder whether we shouldn't extend this into VACUUM FULL too, to >> prevent cancel once it's done that internal commit. It would fix >> the "PANIC: can't abort a committed transaction" problem V.F. has. > Hmm, it seems to work. The attached is for 8.1. Looks OK, but please update the comment right before the RecordTransactionCommit, along the lines of "We prevent cancel interrupts after this point to mitigate the problem that you can't abort the transaction now". regards, tom lane
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
От
Alvaro Herrera
Дата:
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> I wonder whether we shouldn't extend this into VACUUM FULL too, to > >> prevent cancel once it's done that internal commit. It would fix > >> the "PANIC: can't abort a committed transaction" problem V.F. has. > > > Hmm, it seems to work. The attached is for 8.1. > > Looks OK, but please update the comment right before the > RecordTransactionCommit, along the lines of "We prevent cancel > interrupts after this point to mitigate the problem that you > can't abort the transaction now". BTW I'm thinking in backpatching this all the way back to 7.4 -- are we agreed on that? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: >> Looks OK, but please update the comment right before the >> RecordTransactionCommit, along the lines of "We prevent cancel >> interrupts after this point to mitigate the problem that you >> can't abort the transaction now". > BTW I'm thinking in backpatching this all the way back to 7.4 -- are > we agreed on that? Yeah, I would think the problems can manifest all the way back. regards, tom lane
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
От
Alvaro Herrera
Дата:
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> Looks OK, but please update the comment right before the > >> RecordTransactionCommit, along the lines of "We prevent cancel > >> interrupts after this point to mitigate the problem that you > >> can't abort the transaction now". > > > BTW I'm thinking in backpatching this all the way back to 7.4 -- are > > we agreed on that? > > Yeah, I would think the problems can manifest all the way back. Done, thanks for the discussion. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
2009/11/10 Alvaro Herrera <alvherre@commandprompt.com>: > Tom Lane wrote: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >> >> Looks OK, but please update the comment right before the >> >> RecordTransactionCommit, along the lines of "We prevent cancel >> >> interrupts after this point to mitigate the problem that you >> >> can't abort the transaction now". >> >> > BTW I'm thinking in backpatching this all the way back to 7.4 -- are >> > we agreed on that? >> >> Yeah, I would think the problems can manifest all the way back. > > Done, thanks for the discussion. Hello do you have a idea abou lazy vacuum lockinkg problem? any plans? Regards Pavel Stehule > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled
От
Alvaro Herrera
Дата:
Pavel Stehule escribió: > Hello > > do you have a idea abou lazy vacuum lockinkg problem? > > any plans? Well, I understand the issue and we have an idea on how to attack it, but I have no concrete plans to fix it ATM ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
2009/11/10 Alvaro Herrera <alvherre@commandprompt.com>: > Pavel Stehule escribió: > >> Hello >> >> do you have a idea abou lazy vacuum lockinkg problem? >> >> any plans? > > Well, I understand the issue and we have an idea on how to attack it, > but I have no concrete plans to fix it ATM ... ok Pavel > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > The PostgreSQL Company - Command Prompt, Inc. >