Re: BUG #15745: WAL References Invalid Pages...that eventually resolves
От | Peter Geoghegan |
---|---|
Тема | Re: BUG #15745: WAL References Invalid Pages...that eventually resolves |
Дата | |
Msg-id | CAH2-Wzmrx1Je1=hfqpvz22s+nP2uvR9mqQKTQP5hPSxbok=B7w@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #15745: WAL References Invalid Pages...that eventually resolves (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #15745: WAL References Invalid Pages...that eventually resolves
|
Список | pgsql-bugs |
Hi Daniel, On Tue, Apr 9, 2019 at 1:30 PM PG Bug reporting form <noreply@postgresql.org> wrote: > But, for serendipitous reasons, I let this one run for a while. As it turns > out, with each crash, it would make *slightly* more progress than the time > before....and then eventually, it suffered no more faults and caught up > normally. Included is a log that shows how sparse these faults were, > relative to all the traffic going on....: roughly two per segment on this > workload, with large gaps between problematic segments, and not necessarily > repetition in a problematic relation or filenode. That sounds weird. > The fact the standby eventually came up made me suspicious, so I ran amcheck > with a heap re-check, and, no tuples were in violation. > > Included is a log, which shows how the system recovered over and over, > making slight progress each time. This is the entire inventory after such > crashes: after these, the system passed amcheck and appears to work > normally. Did you try bt_index_parent_check('rel', true)? You might want to make sure that work_mem is set sufficiently high so that the downlink-block-is-present check is definitely effective; work_mem bounds the size of a Bloom filter used by the implementation (the heap verification option has its own Bloom filter, bound by maintenance_work_mem). Suggest that you "set client_min_messages=debug1" before running amcheck this way, just in case that shows something interesting. > postgresql-Mon.log-2019-04-08 00:08:22.619 UTC [3323][1/0] : [130-1] > WARNING: page 162136064 of relation base/16385/21372 does not exist These WARNING messages all reference block numbers that look like 32-bits of random garbage, but could be from a very large relation. The relevant WAL record is from B-Tree's opportunistic LP_DEAD garbage collection (not VACUUM). Note that Andres changed this mechanism for v12, so that latestRemovedXid was calculated on the primary, rather than on the standby. I think that this error comes from btree_xlog_delete_get_latestRemovedXid(), which is in 11 but not master/12. I wonder, is "base/16385/21351" the index or the table? Is it possible to run pg_waldump? I think it's the table. If the problem is in btree_xlog_delete_get_latestRemovedXid(), then it is perhaps unsurprising that there isn't evidence of any lasting corruption. -- Peter Geoghegan
В списке pgsql-bugs по дате отправления: