Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

Поиск
Список
Период
Сортировка
От Andrey M. Borodin
Тема Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal
Дата
Msg-id 67EADE8F-AEA6-4B73-8E38-A69E5D48BAFE@yandex-team.ru
обсуждение исходный текст
Ответ на Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal  (Robins Tharakan <tharakan@gmail.com>)
Ответы Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers

> On 29 Jun 2022, at 17:43, Robins Tharakan <tharakan@gmail.com> wrote:


Sorry to bump ancient thread, I have some observations that might or might not be relevant.
Recently we noticed a corruption on one of clusters. The corruption at hand is not in system catalog, but in user
indexes.
The cluster was correctly configured: checksums, fsync, FPI etc.
The cluster never was restored from a backup. It’s a single-node cluster, so it was not ever promoted, pg_rewind-ed
etc.VM had never been rebooted. 

But, the cluster had been experiencing 10 OOMs a day. There were no torn pages, no checsum erros at log at all. Yet,
B-treeindexes became corrupted. 


Sorry for this wall of text, I’m posing everything as-is in case if there is some useful information.

$ /etc/cron.yandex/pg_corruption_check.py --index
2024-03-01 11:54:05,075 ERROR : Corrupted index: 96009 table1_table1message_table1_team_identity_06a95642 XX002 ERROR:
postinglist contains misplaced TID in index "table1_table1message_table1_team_identity_06a95642" DETAIL: Index
tid=(267,34)posting list offset=137 page lsn=31B/62159608. 
2024-03-01 11:54:05,100 ERROR : Corrupted index: 96008 table1_table1message_organization_id_66c18ed2 XX002 ERROR:
postinglist contains misplaced TID in index "table1_table1message_organization_id_66c18ed2" DETAIL: Index tid=(267,34)
postinglist offset=137 page lsn=31B/62158BC8. 
2024-03-01 11:54:05,355 ERROR : Corrupted index: 95804 table2_aler_channel_81aeec_idx XX002 ERROR: posting list
containsmisplaced TID in index "table2_aler_channel_81aeec_idx" DETAIL: Index tid=(336,7) posting list offset=182 page
lsn=314/9B794248.
2024-03-01 11:54:05,716 ERROR : Corrupted index: 95816 table2_table3_channel_id_91a1912f XX002 ERROR: posting list
containsmisplaced TID in index "table2_table3_channel_id_91a1912f" DETAIL: Index tid=(384,2) posting list offset=72
pagelsn=317/3F14F390. 
2024-03-01 11:54:06,068 ERROR : Corrupted index: 95815 table2_table3_channel_filter_id_6706c8b6 XX002 ERROR: posting
listcontains misplaced TID in index "table2_table3_channel_filter_id_6706c8b6" DETAIL: Index tid=(380,2) posting list
offset=72page lsn=317/3F0D8E30. 
2024-03-01 11:54:06,302 ERROR : Corrupted index: 95824 table2_table3_root_alert_group_id_f327f122 XX002 ERROR: item
orderinvariant violated for index "table2_table3_root_alert_group_id_f327f122" DETAIL: Lower index tid=(368,204)
(pointsto heap tid=(48901,2)) higher index tid=(368,205) (points to heap tid=(48901,2)) page lsn=319/3C234588. 
2024-03-01 11:54:06,538 ERROR : Corrupted index: 95810 table2_table3_acknowledged_by_user_id_dd6723dc XX002 ERROR:
postinglist contains misplaced TID in index "table2_table3_acknowledged_by_user_id_dd6723dc" DETAIL: Index tid=(380,69)
postinglist offset=35 page lsn=317/C14E2D50. 
2024-03-01 11:54:06,775 ERROR : Corrupted index: 95825 table2_table3_silenced_by_user_id_40a833a1 XX002 ERROR: posting
listcontains misplaced TID in index "table2_table3_silenced_by_user_id_40a833a1" DETAIL: Index tid=(371,11) posting
listoffset=144 page lsn=318/61171918. 
2024-03-01 11:54:07,009 ERROR : Corrupted index: 95829 table2_table3_wiped_by_id_4326ff61 XX002 ERROR: item order
invariantviolated for index "table2_table3_wiped_by_id_4326ff61" DETAIL: Lower index tid=(373,97) (points to heap
tid=(48901,2))higher index tid=(373,98) (points to heap tid=(48901,2)) page lsn=318/61172788. 
2024-03-01 11:54:07,245 ERROR : Corrupted index: 95823 table2_table3_resolved_by_user_id_463cdf3d XX002 ERROR: posting
listcontains misplaced TID in index "table2_table3_resolved_by_user_id_463cdf3d" DETAIL: Index tid=(375,89) posting
listoffset=144 page lsn=319/3C1DCFC8. 
2024-03-01 11:54:07,479 ERROR : Corrupted index: 95819 table2_table3_maintenance_uuid_9a7b8529_like XX002 ERROR: item
orderinvariant violated for index "table2_table3_maintenance_uuid_9a7b8529_like" DETAIL: Lower index tid=(372,4)
(pointsto heap tid=(48901,2)) higher index tid=(372,5) (points to heap tid=(48901,2)) page lsn=317/C1A210A8. 
2024-03-01 11:54:07,717 ERROR : Corrupted index: 95827 table2_table3_table1_message_id_58a31784_like XX002 ERROR:
postinglist contains misplaced TID in index "table2_table3_table1_message_id_58a31784_like" DETAIL: Index tid=(373,89)
postinglist offset=144 page lsn=319/3C3EE660. 
2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066 webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item
orderinvariant violated for index "webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower index tid=(522,24)
(pointsto heap tid=(73981,1)) higher index tid=(522,25) (points to heap tid=(73981,1)) page lsn=31B/E522B640. 
2024-03-01 11:54:08,646 ERROR : Corrupted index: 95822 table2_table3_resolved_by_alert_id_bbdf0a83 XX002 ERROR: posting
listcontains misplaced TID in index "table2_table3_resolved_by_alert_id_bbdf0a83" DETAIL: Index tid=(618,2) posting
listoffset=150 page lsn=317/C1DE74B8. 
2024-03-01 11:54:08,873 ERROR : Corrupted index: 95427 table2_table3_table1_message_id_key XX002 ERROR: item order
invariantviolated for index "table2_table3_table1_message_id_key" DETAIL: Lower index tid=(369,134) (points to heap
tid=(48901,2))higher index tid=(369,135) (points to heap tid=(48901,2)) page lsn=319/3B629E58. 
2024-03-01 11:54:09,108 ERROR : Corrupted index: 95417 table2_table3_maintenance_uuid_key XX002 ERROR: posting list
containsmisplaced TID in index "table2_table3_maintenance_uuid_key" DETAIL: Index tid=(371,42) posting list offset=47
pagelsn=318/6116FC50. 
2024-03-01 11:54:10,180 ERROR : Corrupted index: 95826 table2_table3_table1_log_message_id_587aaa8d_like XX002 ERROR:
postinglist contains misplaced TID in index "table2_table3_table1_log_message_id_587aaa8d_like" DETAIL: Index
tid=(849,19)posting list offset=79 page lsn=319/3C389B60. 
2024-03-01 11:54:10,689 ERROR : Corrupted index: 95820 table2_table3_mattermost_log_message_id_69bc2ae4_like XX002
ERROR:item order invariant violated for index "table2_table3_mattermost_log_message_id_69bc2ae4_like" DETAIL: Lower
indextid=(559,4) (points to heap tid=(48901,2)) higher index tid=(559,5) (points to heap tid=(48901,2)) page
lsn=317/C1A7BA50.
2024-03-01 11:54:11,760 ERROR : Corrupted index: 95425 table2_table3_table1_log_message_id_key XX002 ERROR: item order
invariantviolated for index "table2_table3_table1_log_message_id_key" DETAIL: Lower index tid=(849,22) (points to heap
tid=(48901,2))higher index tid=(849,23) (points to heap tid=(48901,2)) page lsn=317/3E7EC1F0. 
2024-03-01 11:54:12,282 ERROR : Corrupted index: 95419 table2_table3_mattermost_log_message_id_key XX002 ERROR: posting
listcontains misplaced TID in index "table2_table3_mattermost_log_message_id_key" DETAIL: Index tid=(566,84) posting
listoffset=65 page lsn=319/3B1901F8. 
2024-03-01 11:54:17,990 ERROR : Corrupted index: 95423 table2_table3_public_primary_key_key XX002 ERROR: cross page
itemorder invariant violated for index "table2_table3_public_primary_key_key" DETAIL: Last item on page tid=(727,146)
pagelsn=31B/E104D660. 


Most of these messages look similar, except last one: “cross page item order invariant violated for index”. Indeed,
indexscans were hanging in a cycle. 
I could not locate problem in WAL yet, because a lot of other stuff is going on. But I have no other ideas, but suspect
thatposting list redo is corrupting index in case of a crash. 

Thanks!


Best regards, Andrey Borodin.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation