Re: BUG #16833: postgresql 13.1 process crash every hour
| От | Peter Geoghegan |
|---|---|
| Тема | Re: BUG #16833: postgresql 13.1 process crash every hour |
| Дата | |
| Msg-id | CAH2-Wz=Jr_d-dOYEEmwz0-ifojVNWho01eAqewfQXgKfoe114w@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: BUG #16833: postgresql 13.1 process crash every hour (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: BUG #16833: postgresql 13.1 process crash every hour
|
| Список | pgsql-bugs |
On Fri, May 14, 2021 at 7:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm, looks like it's time to rope Peter Geoghegan in on this discussion. I think that this is likely to be a fairly generic symptom of index corruption. Ockham's razor does not seem to point to a software bug because posting list splits are just not that complicated, and are fairly common in the grand scheme of things. Docker is the kind of thing that I wouldn't necessarily trust to not do something fishy with LVM snapshotting -- I tend to suspect that that is a factor. There was a very similar bug report and stack trace back in March. That case was tied back to generic index corruption using amcheck, with indexes corrupted that weren't implicated in the hard crash. There is a real problem for me to fix here in any case: _bt_swap_posting() is unnecessarily trusting of the state of the posting list tuple (compared to _bt_split(), say). I still plan on adding hardening to _bt_swap_posting() to avoid a hard crash. Unfortunately I missed the opportunity to get that into 13.3, but I'll get it into 13.4. Alex should probably run amcheck to see what that throws up. It should be possible to run amcheck on your database, which will detect corrupt posting list tuples on Postgres 13. It's a contrib extension, so you must first run "CREATE EXTENSION amcheck;". From there, you can run a query like the following (you may want to customize this): SELECT bt_index_parent_check(index => c.oid, heapallindexed => true), c.relname, c.relpages FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' -- Don't check temp tables, which may be from another session: AND c.relpersistence != 't' -- Function may throw an error when this is omitted: AND c.relkind = 'i' AND i.indisready AND i.indisvalid ORDER BY c.relpages DESC; If this query takes too long to complete you may find it useful to add something to limit the indexes check, such as: AND n.nspname = 'public' -- that change to the SQL will make the query just test indexes from the public schema. Do "SET client_min_messages=DEBUG1 " to get a kind of rudimentary progress indicator, if that seems useful to you. The docs have further information on what this bt_index_parent_check function does, should you need it: https://www.postgresql.org/docs/13/amcheck.html -- Peter Geoghegan
В списке pgsql-bugs по дате отправления: