Обсуждение: 7.3b1 panic in vacuum

Поиск
Список
Период
Сортировка

7.3b1 panic in vacuum

От
Mats Lofkvist
Дата:
When doing a 'vacuum full verbose analyze' I got the following:

...
INFO:  --Relation public.attribute--
INFO:  Pages 11177: Changed 1868, reaped 5682, Empty 0, New 0; Tup 1140848: Vac 52622, Keep/VTL 7800/3800, UnUsed 9805,
MinLen61, MaxLen 111; Re-using: Free/Avail. Space 5540384/5361524; EndEmpty/Avail. Pages 0/4634. 
        CPU 2.13s/1.90u sec elapsed 8.53 sec.
INFO:  Index attributeindex: Pages 6433; Tuples 1140848: Deleted 52622.
        CPU 1.53s/6.48u sec elapsed 18.46 sec.
INFO:  Index attributeattridindex: Pages 4341; Tuples 1140848: Deleted 52622.
        CPU 1.37s/5.41u sec elapsed 44.25 sec.
INFO:  Index attributevalueindex: Pages 5848; Tuples 1140848: Deleted 52622.
        CPU 1.54s/5.43u sec elapsed 42.52 sec.
PANIC:  moving chain: failed to add item with len = 93 to page 9395
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
Failed.
!#


The log says:

...
2002-10-31 11:57:53 LOG:  statement: vacuum verbose full analyze;
2002-10-31 11:59:01 LOG:  recycled transaction log file 0000000000000079
2002-10-31 12:00:15 LOG:  recycled transaction log file 000000000000007A
2002-10-31 12:00:15 LOG:  recycled transaction log file 000000000000007B
2002-10-31 12:00:15 LOG:  recycled transaction log file 000000000000007C
2002-10-31 12:01:21 LOG:  recycled transaction log file 000000000000007F
2002-10-31 12:01:21 LOG:  recycled transaction log file 000000000000007D
2002-10-31 12:01:21 LOG:  recycled transaction log file 000000000000007E
2002-10-31 12:03:46 LOG:  recycled transaction log file 0000000000000082
2002-10-31 12:03:46 LOG:  recycled transaction log file 0000000000000080
2002-10-31 12:03:46 LOG:  recycled transaction log file 0000000000000081
2002-10-31 12:06:34 LOG:  recycled transaction log file 0000000000000085
2002-10-31 12:06:34 LOG:  recycled transaction log file 0000000000000083
2002-10-31 12:06:34 LOG:  recycled transaction log file 0000000000000084
2002-10-31 12:07:04 PANIC:  moving chain: failed to add item with len = 93 to pa
ge 9395
2002-10-31 12:07:04 LOG:  statement: vacuum full verbose analyze;
2002-10-31 12:07:04 LOG:  server process (pid 1943) exited with exit code 2
2002-10-31 12:07:04 LOG:  terminating any other active server processes
2002-10-31 12:07:04 WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-10-31 12:07:04 WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-10-31 12:07:04 WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-10-31 12:07:04 WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-10-31 12:07:05 WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-10-31 12:07:05 WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-10-31 12:07:05 WARNING:  Message from PostgreSQL backend:
        The Postmaster has informed me that some other backend
        died abnormally and possibly corrupted shared memory.
        I have rolled back the current transaction and am
        going to terminate your database system connection and exit.
        Please reconnect to the database system and repeat your query.
2002-10-31 12:07:05 LOG:  all server processes terminated; reinitializing shared memory and semaphores
2002-10-31 12:07:05 LOG:  database system was interrupted at 2002-10-31 12:06:51 MET
2002-10-31 12:07:05 LOG:  checkpoint record is at 0/8938FE28
2002-10-31 12:07:05 LOG:  redo record is at 0/8900AF48; undo record is at 0/0; shutdown FALSE
2002-10-31 12:07:05 LOG:  next transaction id: 1285860; next oid: 3871170
2002-10-31 12:07:05 LOG:  database system was not properly shut down; automatic recovery in progress
2002-10-31 12:07:05 LOG:  redo starts at 0/8900AF48
2002-10-31 12:07:16 LOG:  ReadRecord: unexpected pageaddr 0/856CC000 in log file 0, segment 138, offset 7127040
2002-10-31 12:07:16 LOG:  redo done at 0/8A6CBF60
2002-10-31 12:07:25 LOG:  recycled transaction log file 0000000000000088
2002-10-31 12:07:25 LOG:  recycled transaction log file 0000000000000086
2002-10-31 12:07:25 LOG:  recycled transaction log file 0000000000000087
2002-10-31 12:07:25 LOG:  database system is ready



I'm running it on a sparc/solaris box. There is no core dump.

When rerunning the vacuum on the table it failed on, it seems to work:

mats=# vacuum full verbose analyze attribute;
INFO:  --Relation public.attribute--
INFO:  Pages 11177: Changed 2479, reaped 5885, Empty 0, New 0; Tup 1133048: Vac 61518, Keep/VTL 0/0, UnUsed 8764,
MinLen61, MaxLen 111; Re-using: Free/Avail. Space 6284164/6105932; EndEmpty/Avail. Pages 0/4715. 
        CPU 3.06s/1.73u sec elapsed 19.10 sec.
INFO:  Index attributeindex: Pages 6483; Tuples 1133048: Deleted 22900.
        CPU 1.60s/8.52u sec elapsed 21.51 sec.
INFO:  Index attributeattridindex: Pages 4341; Tuples 1133048: Deleted 22900.
        CPU 1.45s/6.64u sec elapsed 45.20 sec.
INFO:  Index attributevalueindex: Pages 5906; Tuples 1133048: Deleted 22900.
        CPU 1.32s/6.99u sec elapsed 42.05 sec.
INFO:  Rel attribute: Pages: 11177 --> 10423; Tuple(s) moved: 35627.
        CPU 3.39s/15.59u sec elapsed 115.51 sec.
INFO:  Index attributeindex: Pages 6526; Tuples 1133048: Deleted 35627.
        CPU 1.13s/1.80u sec elapsed 12.66 sec.
INFO:  Index attributeattridindex: Pages 4355; Tuples 1133048: Deleted 35627.
        CPU 1.19s/2.01u sec elapsed 18.23 sec.
INFO:  Index attributevalueindex: Pages 5959; Tuples 1133048: Deleted 35627.
        CPU 1.34s/2.07u sec elapsed 28.98 sec.
INFO:  --Relation pg_toast.pg_toast_1601292--
INFO:  Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, UnUsed 0, MinLen 0, MaxLen 0;
Re-using:Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. 
        CPU 0.00s/0.00u sec elapsed 0.07 sec.
INFO:  Index pg_toast_1601292_index: Pages 1; Tuples 0.
        CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  Analyzing public.attribute
VACUUM
mats=#



      _
Mats Lofkvist
mal@algonet.se

Re: 7.3b1 panic in vacuum

От
Tom Lane
Дата:
Mats Lofkvist <mal@algonet.se> writes:
> PANIC:  moving chain: failed to add item with len = 93 to page 9395

Thanks for the report.  I see a likely cause of this, which seems to
have been there for awhile:

                    to_vacpage->free -= MAXALIGN(tlen);
                    if (to_vacpage->offsets_used >= to_vacpage->offsets_free)
                        to_vacpage->free -= MAXALIGN(sizeof(ItemIdData));

The second MAXALIGN() shouldn't be there.  Because it is,
to_vacpage->free is decremented too much, possibly causing it to wrap
around and look large and positive, possibly allowing the code to try
to move more stuff onto the page than will fit.

> When rerunning the vacuum on the table it failed on, it seems to work:

Yeah, it would; after a system restart there'd be no tuple-chain-moving
activity because no other open transactions, so the buggy code wouldn't
get executed.

> I'm running it on a sparc/solaris box. There is no core dump.

There wouldn't be that, either; elog(PANIC) doesn't induce a core dump.
Maybe it should.

            regards, tom lane