Обсуждение: ERROR: posting list tuple with 20 items cannot be split at offset 168

Поиск
Список
Период
Сортировка

ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Herman verschooten
Дата:
Hi,

I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.

this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
I traced it back to an index issue on a boolean column.

This is the table:

tranman_production=# \d freights
                                         Table "public.freights"
    Column    |            Type             | Collation | Nullable |               Default
--------------+-----------------------------+-----------+----------+--------------------------------------
 id           | integer                     |           | not null | nextval('freights_id_seq'::regclass)
 order_id     | integer                     |           |          |
 product_id   | integer                     |           |          |
 amount       | integer                     |           |          |
 reference_1  | character varying           |           |          |
 reference_2  | character varying           |           |          |
 reference_3  | character varying           |           |          |
 reference_4  | character varying           |           |          |
 status       | integer                     |           |          | 0
 created_at   | timestamp without time zone |           | not null |
 updated_at   | timestamp without time zone |           | not null |
 cmr          | bigint                      |           |          |
 amount_out   | integer                     |           | not null | 0
 paused       | boolean                     |           |          | false
 cmr_received | boolean                     |           |          | false
Indexes:
    "freights_pkey" PRIMARY KEY, btree (id)
    "index_freights_on_cmr" btree (cmr)
    "index_freights_on_cmr_received" btree (cmr_received)
    "index_freights_on_order_id" btree (order_id)
    "index_freights_on_product_id" btree (product_id)
    "index_freights_on_reference_1" btree (reference_1)
    "index_freights_on_reference_2" btree (reference_2)
    "index_freights_on_reference_3" btree (reference_3)
    "index_freights_on_reference_4" btree (reference_4)
    "index_freights_on_status" btree (status)
Foreign-key constraints:
    "fk_rails_21fdf332ee" FOREIGN KEY (product_id) REFERENCES products(id)
    "fk_rails_af9f7b0831" FOREIGN KEY (order_id) REFERENCES orders(id)
Referenced by:
    TABLE "freight_docs" CONSTRAINT "fk_rails_10448cfa4f" FOREIGN KEY (freight_id) REFERENCES freights(id)
    TABLE "freight_events" CONSTRAINT "fk_rails_20c83ae774" FOREIGN KEY (freight_id) REFERENCES freights(id)
    TABLE "rides" CONSTRAINT "fk_rails_a50483d9c3" FOREIGN KEY (freight_id) REFERENCES freights(id)

And this is what happens:

tranman_production=# update freights set cmr_received=false where id=49632;
ERROR:  XX000: posting list tuple with 20 items cannot be split at offset 168
LOCATION:  _bt_swap_posting, nbtdedup.c:1037

If I drop the index index_freights_on_cmr_received, then the update succeeds.

If I recreate the index, the error resurfaces.

Note that it does not happen for every row in the database.

Sincerely,

Herman verschooten

Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Dmitry Dolgov
Дата:
> On Mon, Oct 25, 2021 at 11:58:48AM +0200, Herman verschooten wrote:
> Hi,
>
> I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.
>
> this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
> I traced it back to an index issue on a boolean column.

Hi,

thanks for reporting the issue.

> tranman_production=# update freights set cmr_received=false where id=49632;
> ERROR:  XX000: posting list tuple with 20 items cannot be split at offset 168
> LOCATION:  _bt_swap_posting, nbtdedup.c:1037
>
> If I drop the index index_freights_on_cmr_received, then the update succeeds.
>
> If I recreate the index, the error resurfaces.
>
> Note that it does not happen for every row in the database.

Interesting, during posting list split the postingoff is bigger than the
number of elements in the posting list. Have you tried to run amcheck
[1] on this index? As in the neighbour thread probably heapallindexed is
important, as the index recreation doesn't help. Otherwise, is there a
chance to use pageinspect [2] to take a look at the structure if the
index?

[1]: https://www.postgresql.org/docs/13/amcheck.html
[2]: https://www.postgresql.org/docs/13/pageinspect.html



Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Dmitry Dolgov
Дата:
> On Mon, Oct 25, 2021 at 02:41:43PM +0200, Herman verschooten wrote:
> I made a mistake apparently on checking the wrong thing.
> someone on slack helped me.
>
> tranman_production=#  SELECT bt_index_check(c.oid), c.relname, c.relpages
> tranman_production-# FROM pg_index i
> tranman_production-# JOIN pg_opclass op ON i.indclass[0] = op.oid
> tranman_production-# JOIN pg_am am ON op.opcmethod = am.oid
> tranman_production-# JOIN pg_class c ON i.indexrelid = c.oid
> tranman_production-# JOIN pg_namespace n ON c.relnamespace = n.oid
> tranman_production-# WHERE am.amname = 'btree' AND n.nspname = 'public'
> tranman_production-# -- Don't check temp tables, which may be from another session:
> tranman_production-# AND c.relpersistence != 't'
> tranman_production-# -- Function may throw an error when this is omitted:
> tranman_production-# AND i.indisready AND i.indisvalid
> tranman_production-# ORDER BY c.relpages;
> ERROR:  item order invariant violated for index "index_products_on_group"
> DETAIL:  Lower index tid=(1,353) (points to heap tid=(19,8)) higher index tid=(1,354) (points to heap tid=(14,14))
pagelsn=5/CEC25DA8.
 
> tranman_production=#  SELECT bt_index_check(c.oid), c.relname, c.relpages
> FROM pg_index i
> JOIN pg_opclass op ON i.indclass[0] = op.oid
> JOIN pg_am am ON op.opcmethod = am.oid
> JOIN pg_class c ON i.indexrelid = c.oid
> JOIN pg_namespace n ON c.relnamespace = n.oid
> WHERE am.amname = 'btree' AND n.nspname = 'public'
> -- Don't check temp tables, which may be from another session:
> AND c.relpersistence != 't'
> -- Function may throw an error when this is omitted:
> AND i.indisready AND i.indisvalid
> ORDER BY c.relpages;
> ERROR:  item order invariant violated for index "index_products_on_group"
> DETAIL:  Lower index tid=(1,353) (points to heap tid=(19,8)) higher index tid=(1,354) (points to heap tid=(14,14))
pagelsn=5/CEC25DA8.
 
>
> I dropped that index and ran it again, and kept on dropping indexes
>
> ERROR:  item order invariant violated for index "index_cleanings_on_receipt"
> DETAIL:  Lower index tid=(39,21) (points to heap tid=(238,98)) higher index tid=(39,22) (points to heap tid=(243,7))
pagelsn=1/B853F98.
 
>
> ERROR:  item order invariant violated for index "index_freights_on_reference_4"
> DETAIL:  Lower index tid=(34,51) (points to heap tid=(67,7)) higher index tid=(34,52) (points to heap tid=(84,2))
pagelsn=6/7EFB68.
 
>
> ERROR:  item order invariant violated for index "index_freights_on_reference_3"
> DETAIL:  Lower index tid=(38,82) (points to heap tid=(362,75)) higher index tid=(38,83) (points to heap tid=(194,21))
pagelsn=4/FEA79090.
 
>
> ERROR:  item order invariant violated for index "index_freights_on_reference_2"
> DETAIL:  Lower index tid=(10,44) (points to heap tid=(176,26)) higher index tid=(10,45) (points to heap tid=(215,60))
pagelsn=5/46A4D90.
 
>
> ERROR:  item order invariant violated for index "index_freights_on_reference_1"
> DETAIL:  Lower index tid=(43,214) (points to heap tid=(112,46)) higher index tid=(43,215) (points to heap
tid=(112,43))page lsn=5/C3999990.
 
>
>
> Now no more errors are displayed, and I can run my initial update again, with the index created.
>
> I recreated the 6 indexes and everyhting still works.
>
> Herman

Does it mean that everything is fine after recreating
index_freights_on_reference_<d> indexes? Well, that would mean they were
somehow corrupted, and reindexing is the way to go. Although it's not
clear for me, why the problem was temporarily avoided by dropping
another index on cmr_received.

PS: One small tip, please reply to all in the mailing list, otherwise it
would be harder for others to understand what happened with the discussion.



Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Peter Geoghegan
Дата:
On Mon, Oct 25, 2021 at 2:59 AM Herman verschooten
<Herman@verschooten.net> wrote:
> tranman_production=# update freights set cmr_received=false where id=49632;
> ERROR:  XX000: posting list tuple with 20 items cannot be split at offset 168
> LOCATION:  _bt_swap_posting, nbtdedup.c:1037
>
> If I drop the index index_freights_on_cmr_received, then the update succeeds.

What you see here is a defensive "can't happen" error that I added in
commit 8f72bbac, and backpatched to Postgres 13.4, which came out on
2021-08-12. The goal of that error is to make a possible hard crash
due to corruption into a slightly friendlier kind of failure (the
error that you see here). Were you running 13.4 before the upgrade?

If you were on 13.3 or earlier before the upgrade to 14, then it's
possible that the problem was there all along, but is only now visible
for the first time.

-- 
Peter Geoghegan



Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Herman verschooten
Дата:
Hi Peter,

Thanks for the info.
I don’t recall the exact 13 version we were on.

I can confirm that after dropping and recreating the 6 indexes everything is working fine.

Thanks for all the help, both here and on slack,

Herman


> Op 25 okt. 2021, om 17:29 heeft Peter Geoghegan <pg@bowt.ie> het volgende geschreven:
>
> On Mon, Oct 25, 2021 at 2:59 AM Herman verschooten
> <Herman@verschooten.net> wrote:
>> tranman_production=# update freights set cmr_received=false where id=49632;
>> ERROR:  XX000: posting list tuple with 20 items cannot be split at offset 168
>> LOCATION:  _bt_swap_posting, nbtdedup.c:1037
>>
>> If I drop the index index_freights_on_cmr_received, then the update succeeds.
>
> What you see here is a defensive "can't happen" error that I added in
> commit 8f72bbac, and backpatched to Postgres 13.4, which came out on
> 2021-08-12. The goal of that error is to make a possible hard crash
> due to corruption into a slightly friendlier kind of failure (the
> error that you see here). Were you running 13.4 before the upgrade?
>
> If you were on 13.3 or earlier before the upgrade to 14, then it's
> possible that the problem was there all along, but is only now visible
> for the first time.
>
> --
> Peter Geoghegan




Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Peter Geoghegan
Дата:
On Mon, Oct 25, 2021 at 10:35 PM Herman verschooten
<Herman@verschooten.net> wrote:
> I can confirm that after dropping and recreating the 6 indexes everything is working fine.

Did you also run VACUUM?

You said that not all rows seem to be affected. Is there any pattern
that you notice about the rows that are affected?

I understand (from your remarks onSlack) that you found that the
problem seemed to go away once you dropped the index named
index_freights_on_cmr_received. But once you recreated the index from
scratch, the problem came back.

In other words, rebuilding the index didn't seem to make the "posting
list tuple with 20 items cannot be split at offset 168" error go away,
but dropping the index appeared to *temporarily* resolve the issue?
Suggesting that the problem was either in the table, or in the code
that builds a new index structure by scanning the table?

> Thanks for all the help, both here and on slack,

Thanks for the report!

--
Peter Geoghegan



Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Herman verschooten
Дата:
Peter,

No we just noticed that we were unable to update that 1 record, and only that Boolean field.  We did not search for
otherrecords that might have been affected. 

I did as yet not run VACUUM, I’ll look into that later today.

Herman

Verstuurd vanaf mijn iPad

> Op 27 okt. 2021 om 03:45 heeft Peter Geoghegan <pg@bowt.ie> het volgende geschreven:
>
> On Mon, Oct 25, 2021 at 10:35 PM Herman verschooten
> <Herman@verschooten.net> wrote:
>> I can confirm that after dropping and recreating the 6 indexes everything is working fine.
>
> Did you also run VACUUM?
>
> You said that not all rows seem to be affected. Is there any pattern
> that you notice about the rows that are affected?
>
> I understand (from your remarks onSlack) that you found that the
> problem seemed to go away once you dropped the index named
> index_freights_on_cmr_received. But once you recreated the index from
> scratch, the problem came back.
>
> In other words, rebuilding the index didn't seem to make the "posting
> list tuple with 20 items cannot be split at offset 168" error go away,
> but dropping the index appeared to *temporarily* resolve the issue?
> Suggesting that the problem was either in the table, or in the code
> that builds a new index structure by scanning the table?
>
>> Thanks for all the help, both here and on slack,
>
> Thanks for the report!
>
> --
> Peter Geoghegan




Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Andres Freund
Дата:
Hi,

On 2021-10-25 11:58:48 +0200, Herman verschooten wrote:
> I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.

What filesystem is the postgres data stored on?


> this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
> I traced it back to an index issue on a boolean column.
> 
> If I drop the index index_freights_on_cmr_received, then the update succeeds.
> 
> If I recreate the index, the error resurfaces.

Any chance you still have the WAL around from since the update? If we could
scan the WAL to see what type of activity lead to the corruption, we might be
able to narrow the cause down more easily.

Greetings,

Andres Freund



Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Herman verschooten
Дата:
Hi,

The filesystem is EXT4 on a Proxmox container using LVM-thin.

I have to check, but I believe there may still be a complete backup of the instance.

Herman

> Op 28 okt. 2021, om 23:47 heeft Andres Freund <andres@anarazel.de> het volgende geschreven:
>
> Hi,
>
> On 2021-10-25 11:58:48 +0200, Herman verschooten wrote:
>> I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.
>
> What filesystem is the postgres data stored on?
>
>
>> this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
>> I traced it back to an index issue on a boolean column.
>>
>> If I drop the index index_freights_on_cmr_received, then the update succeeds.
>>
>> If I recreate the index, the error resurfaces.
>
> Any chance you still have the WAL around from since the update? If we could
> scan the WAL to see what type of activity lead to the corruption, we might be
> able to narrow the cause down more easily.
>
> Greetings,
>
> Andres Freund




Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

От
Peter Geoghegan
Дата:
On Fri, Oct 29, 2021 at 1:47 AM Herman verschooten
<Herman@verschooten.net> wrote:
> The filesystem is EXT4 on a Proxmox container using LVM-thin.
>
> I have to check, but I believe there may still be a complete backup of the instance.

Thanks for getting back about this.

A fix for this bug will appear in Postgres 14.1. In the meantime I
suggest that you avoid using parallel VACUUM. For example, any scripts
that run VACUUM could be temporarily changed to make sure that
parallelism won't be used. If the script says "vacuum my_table;", then
that could be changed to "vacuum (parallel 0) my_table;", for example.
Just until you can upgrade to the next point release.

-- 
Peter Geoghegan