Re: BUG #17245: Index corruption involving deduplicated entries
От | Peter Geoghegan |
---|---|
Тема | Re: BUG #17245: Index corruption involving deduplicated entries |
Дата | |
Msg-id | CAH2-WzkTd6wHXhxn=hZT3NYwgoKNHaC5sDUwswtUo7Ay-VdR4w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17245: Index corruption involving deduplicated entries (Kamigishi Rei <iijima.yun@koumakan.jp>) |
Ответы |
Re: BUG #17245: Index corruption involving deduplicated entries
Re: BUG #17245: Index corruption involving deduplicated entries Re: BUG #17245: Index corruption involving deduplicated entries |
Список | pgsql-bugs |
On Fri, Oct 29, 2021 at 11:36 AM Kamigishi Rei <iijima.yun@koumakan.jp> wrote: > The newly manifested issue is caught by pg_amcheck: > > btree index "azurlane_wiki.mediawiki.page_main_title": > ERROR: item order invariant violated for index "page_main_title" > DETAIL: Lower index tid=(17,157) (points to heap tid=(540,5)) > higher index tid=(17,158) (points to heap tid=(540,5)) page lsn=2/A019DD78. Great! I'm not surprised to see that it's the page table, once again. It's not particularly big, right? Are there other tables that are much larger? > The weird part about this is that the WAL archive does not seem to > contain any data for 157 and 158 above (in 1663/19243/274869 blk 17). > The last two entries are > > rmgr: Btree len (rec/tot): 53/ 4885, tx: 2085600, lsn: > 2/A0195AE0, prev 2/A01943F0, desc: INSERT_LEAF off 155, blkref #0: rel > 1663/19243/274869 blk 17 FPW > rmgr: Btree len (rec/tot): 72/ 72, tx: 2085602, lsn: > 2/A019DD30, prev 2/A019DCF0, desc: INSERT_LEAF off 156, blkref #0: rel > 1663/19243/274869 blk 17 > > The WAL file in data14/pg_wal does not have anything related to 157 and > 158 for this filenode/blk as well. If this was a heap relation then that would be true, because the offset number of a heap needs to be stable, at least within a "VACUUM cycle" (otherwise indexes will point to the wrong things, which would of course be wrong). However, this relation is a B-Tree index, where TIDs/page offset numbers are not stable at all. Almost all individual index tuple inserts onto a B-Tree page put the new index tuple "between" existing index tuples. This will "shift" whatever index tuples are to the right of the position of the new tuple. For example, with "INSERT_LEAF off 156", the insert atomic operation will shift any existing index tuple at page offset 156 go to page offset 157, plus any index tuple that was at page offset 157 is going to go to 158. And so on. We don't physically shift the index tuples themselves, but we do shift the item ID/line pointer array at the start of the page, so it's not too expensive. -- Peter Geoghegan
В списке pgsql-bugs по дате отправления: