Обсуждение: BUG #17268: Possible corruption in toast index after reindex index concurrently
BUG #17268: Possible corruption in toast index after reindex index concurrently
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 17268 Logged by: Alexey Ermakov Email address: alexey.ermakov@dataegret.com PostgreSQL version: 13.4 Operating system: Ubuntu 18.04.5 LTS Description: Hello, after running this query: 2021-11-02 13:15:45.469 UTC 2379 postgres@xx from [local] [vxid:24/0 txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index we've got following errors (here is first one, others looks exactly the same with same "toast value"): 2021-11-02 20:37:12.305 UTC 27935 yy@xx from [local] [vxid:41/199242673 txid:0] [SELECT] ERROR: missing chunk number 0 for toast value 4040061139 in pg_toast_2624976286 it turns out one entry in toast index (?) was corrupted: select md5(body) from zz where id = ...; ERROR: missing chunk number 0 for toast value 4040061139 in pg_toast_2624976286 According to "created_at" column in linked table row was created at "2021-11-02 13:04:22.192125", i.e. during reindex concurrently. I'm wondering if it's known bug and how risky could it be to reindex toast's indexes. It was done automatically with tool which monitors indexes' bloat and index size reduced several times in this case. It's possible to get additional details from backup and WAL archive but it'll take some time to restore (row with corrupted data was removed). -- Thanks, Alexey Ermakov
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Peter Geoghegan
Дата:
On Wed, Nov 3, 2021 at 3:26 AM PG Bug reporting form <noreply@postgresql.org> wrote: > it turns out one entry in toast index (?) was corrupted: > select md5(body) from zz where id = ...; > ERROR: missing chunk number 0 for toast value 4040061139 in > pg_toast_2624976286 > > According to "created_at" column in linked table row was created at > "2021-11-02 13:04:22.192125", i.e. during reindex concurrently. I wonder if it's a coincidence that that number (~4.04 billion) is not that far from 2^32-1 (~4.294 billion). Can you run amcheck? Perhaps the output of the following will be interesting: create extension amcheck; set client_min_messages=debug1; select bt_index_check('pg_toast.pg_toast_2624976286_index', true); (Couldn't hurt to try it, at least.) > I'm wondering if it's known bug and how risky could it be to reindex toast's > indexes. It was done automatically with tool which monitors indexes' bloat > and index size reduced several times in this case. If I had to guess, I'd guess that this is a new and unknown bug. -- Peter Geoghegan
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Maxim Boguk
Дата:
On Thu, Nov 4, 2021 at 3:07 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Nov 3, 2021 at 3:26 AM PG Bug reporting form > <noreply@postgresql.org> wrote: > > it turns out one entry in toast index (?) was corrupted: > > select md5(body) from zz where id = ...; > > ERROR: missing chunk number 0 for toast value 4040061139 in > > pg_toast_2624976286 > > > > According to "created_at" column in linked table row was created at > > "2021-11-02 13:04:22.192125", i.e. during reindex concurrently. > > I wonder if it's a coincidence that that number (~4.04 billion) is not > that far from 2^32-1 (~4.294 billion). > > Can you run amcheck? Perhaps the output of the following will be interesting: > > create extension amcheck; > set client_min_messages=debug1; > select bt_index_check('pg_toast.pg_toast_2624976286_index', true); > > (Couldn't hurt to try it, at least.) Hi Peter, Please see a result of amcheck select bt_index_check('pg_toast.pg_toast_2624976286_index', true); DEBUG: verifying consistency of tree structure for index "pg_toast_2624976286_index" DEBUG: verifying level 3 (true root level) DEBUG: verifying level 2 DEBUG: verifying level 1 DEBUG: verifying level 0 (leaf level) DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no first data item DEBUG: verifying that tuples from index "pg_toast_2624976286_index" are present in "pg_toast_2624976286" ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks matching index tuple within index "pg_toast_2624976286_index" HINT: Retrying verification using the function bt_index_parent_check() might provide a more specific error. -- Maxim Boguk Senior Postgresql DBA https://dataegret.com/ Phone RU: +7 985 433 0000 Phone UA: +380 99 143 0000 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Peter Geoghegan
Дата:
On Thu, Nov 4, 2021 at 11:08 AM Maxim Boguk <maxim.boguk@gmail.com> wrote: > select bt_index_check('pg_toast.pg_toast_2624976286_index', true); > DEBUG: verifying consistency of tree structure for index > "pg_toast_2624976286_index" > DEBUG: verifying level 3 (true root level) > DEBUG: verifying level 2 > DEBUG: verifying level 1 > DEBUG: verifying level 0 (leaf level) > DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no > first data item > DEBUG: verifying that tuples from index "pg_toast_2624976286_index" > are present in "pg_toast_2624976286" > ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks > matching index tuple within index "pg_toast_2624976286_index" > HINT: Retrying verification using the function > bt_index_parent_check() might provide a more specific error. That's an unusually large TOAST table. It's at least ~454.42GiB, based on this error. Is the block number 59561917 near the end of the table? Structural verification of the B-Tree itself doesn't throw any errors here. This leaves little doubt that the problem is that the index is missing entries that it really should have. That's not really new information, though -- I was pretty sure of that much already. -- Peter Geoghegan
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Maxim Boguk
Дата:
On Thu, Nov 4, 2021 at 8:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Nov 4, 2021 at 11:08 AM Maxim Boguk <maxim.boguk@gmail.com> wrote: > > select bt_index_check('pg_toast.pg_toast_2624976286_index', true); > > DEBUG: verifying consistency of tree structure for index > > "pg_toast_2624976286_index" > > DEBUG: verifying level 3 (true root level) > > DEBUG: verifying level 2 > > DEBUG: verifying level 1 > > DEBUG: verifying level 0 (leaf level) > > DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no > > first data item > > DEBUG: verifying that tuples from index "pg_toast_2624976286_index" > > are present in "pg_toast_2624976286" > > ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks > > matching index tuple within index "pg_toast_2624976286_index" > > HINT: Retrying verification using the function > > bt_index_parent_check() might provide a more specific error. > > That's an unusually large TOAST table. It's at least ~454.42GiB, based > on this error. Is the block number 59561917 near the end of the table? select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_2624976286')); pg_size_pretty ---------------- 473 GB now... and yes during the time of error page 59561917 was very close to the end of the table. There was a high chance (but not 100%) that the corresponding main table entry had been inserted during reindex CONCURRENTLY of the toast index run. We have base backup and wal archive so theoretically it's possible to restore sequence of writes which lead to error, but given huge size of relation in interest (and even bigger size of whole database 10+TB) and large amount of writes it's a complicated task (especially when I not really sure what exactly to look for in waldump output). -- Maxim Boguk Senior Postgresql DBA https://dataegret.com/ Phone RU: +7 985 433 0000 Phone UA: +380 99 143 0000 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Maxim Boguk
Дата:
On Thu, Nov 4, 2021 at 9:47 PM Maxim Boguk <maxim.boguk@gmail.com> wrote: > > On Thu, Nov 4, 2021 at 8:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Thu, Nov 4, 2021 at 11:08 AM Maxim Boguk <maxim.boguk@gmail.com> wrote: > > > select bt_index_check('pg_toast.pg_toast_2624976286_index', true); > > > DEBUG: verifying consistency of tree structure for index > > > "pg_toast_2624976286_index" > > > DEBUG: verifying level 3 (true root level) > > > DEBUG: verifying level 2 > > > DEBUG: verifying level 1 > > > DEBUG: verifying level 0 (leaf level) > > > DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no > > > first data item > > > DEBUG: verifying that tuples from index "pg_toast_2624976286_index" > > > are present in "pg_toast_2624976286" > > > ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks > > > matching index tuple within index "pg_toast_2624976286_index" > > > HINT: Retrying verification using the function > > > bt_index_parent_check() might provide a more specific error. > > > > That's an unusually large TOAST table. It's at least ~454.42GiB, based > > on this error. Is the block number 59561917 near the end of the table? > > select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_2624976286')); > pg_size_pretty > ---------------- > 473 GB > now... and yes during the time of error page 59561917 was very close > to the end of the table. > There was a high chance (but not 100%) that the corresponding main > table entry had been inserted during reindex CONCURRENTLY of the toast > index run. UPDATE: the problem base table entry which had erroneous toast data definitely had been created during the REINDEX run of toast idx. 2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0 txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC. So there seems some subtle bug with indexing new toast data during REINDEX INDEX CONCURRENTLY of the toast index. -- Maxim Boguk Senior Postgresql DBA https://dataegret.com/ Phone RU: +7 985 433 0000 Phone UA: +380 99 143 0000 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Peter Geoghegan
Дата:
On Thu, Nov 4, 2021 at 12:47 PM Maxim Boguk <maxim.boguk@gmail.com> wrote: > now... and yes during the time of error page 59561917 was very close > to the end of the table. > There was a high chance (but not 100%) that the corresponding main > table entry had been inserted during reindex CONCURRENTLY of the toast > index run. It might be useful if you located the leaf page that the missing index tuple is supposed to be on. It's possible that there is a recognizable pattern. If you knew the block number of the relevant leaf page in the index already, then you could easily dump the relevant page to a small file, and share it with us here. The usual procedure is described here: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump The tricky part is figuring out which block number is the one of interest. I can't think of any easy way of doing that in a production database. The easiest approach I can think of is to use pg_buffercache. Restart the database (or more likely an instance of the database that has the problem, but isn't the live production database). Then write a query like this: EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_toast_2624976286 WHERE chunk_id = 4040061139; (The BUFFERS stuff is to verify that you got buffer hits.) Next query the blocks that you see in pg_buffercache: SELECT relblocknumber FROM pg_buffercache WHERE relfilenode = pg_relation_filenode('pg_toast.pg_toast_2624976286_index'); Finally, note down the block numbers that the query returns. There will probably be 3 or 4. Just send me any that are non-0 (that's just the metapage). I only really care about the leaf page, but I can figure that part out myself when I have the pages you access. (It's a pity there isn't a less cumbersome procedure here.) -- Peter Geoghegan
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On November 4, 2021 1:01:44 PM PDT, Maxim Boguk <maxim.boguk@gmail.com> wrote: >UPDATE: the problem base table entry which had erroneous toast data >definitely had been created during the REINDEX run of toast idx. Do you have any details about the transaction / application that inserted the data? >2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0 >txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX >INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index >and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC. > >So there seems some subtle bug with indexing new toast data during >REINDEX INDEX CONCURRENTLY of the toast index. Any chance you're using prepared transactions? Regards, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Maxim Boguk
Дата:
On Sun, Nov 7, 2021 at 2:07 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On November 4, 2021 1:01:44 PM PDT, Maxim Boguk <maxim.boguk@gmail.com> wrote: > >UPDATE: the problem base table entry which had erroneous toast data > >definitely had been created during the REINDEX run of toast idx. > > Do you have any details about the transaction / application that inserted the data? Well... jdbc with server side prepared statements off (pgbouncer). > >2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0 > >txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX > >INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index > >and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC. > > > >So there seems some subtle bug with indexing new toast data during > >REINDEX INDEX CONCURRENTLY of the toast index. > > Any chance you're using prepared transactions? Most definitely no, prepared TS are disabled server-wide. postgres=# show max_prepared_transactions ; max_prepared_transactions --------------------------- 0 -- Maxim Boguk Senior Postgresql DBA https://dataegret.com/ Phone RU: +7 985 433 0000 Phone UA: +380 99 143 0000 Phone AU: +61 45 218 5678 LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b Skype: maxim.boguk "Доктор, вы мне советовали так не делать, но почему мне по-прежнему больно когда я так делаю ещё раз?"
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Alexey Ermakov
Дата:
Hello, On 2021-11-08 02:29, Maxim Boguk wrote: > On Sun, Nov 7, 2021 at 2:07 AM Andres Freund <andres@anarazel.de> wrote: >> Hi, >> >> On November 4, 2021 1:01:44 PM PDT, Maxim Boguk <maxim.boguk@gmail.com> wrote: >>> UPDATE: the problem base table entry which had erroneous toast data >>> definitely had been created during the REINDEX run of toast idx. >> Do you have any details about the transaction / application that inserted the data? > Well... jdbc with server side prepared statements off (pgbouncer). > I was told that application inserts rows into given table with queries like: INSERT INTO zz (id, body) VALUES (?, ?) ON CONFLICT (id) DO NOTHING inside transaction. It's possible to attempt to insert row with same id several times inside one transaction (in that case usually that id is new one i.e. was not presented in the table before that transaction). Hope it could help in some way. -- Thanks, Alexey Ermakov
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On 2021-11-04 20:07:28 +0200, Maxim Boguk wrote: > ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks > matching index tuple within index "pg_toast_2624976286_index" > HINT: Retrying verification using the function > bt_index_parent_check() might provide a more specific error. Could you use https://www.postgresql.org/docs/devel/pageinspect.html to get the heap items for this page? Best after verifying that that ctid still shows an amcheck error. Something like SELECT * FROM page_header(get_raw_page('pg_toast_2624976286', 59561917)); SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, mask.raw_flags, mask.combined_flags,t_hoff, t_bits FROM heap_page_items(get_raw_page('pg_toast_2624976286', 59561917)), heap_tuple_infomask_flags(t_infomask, t_infomask2) AS mask; Do you have WAL archiving or such set up? If you still have the WAL from that time it'd be helpful. Greetings, Andres Freund
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On 2021-11-04 22:01:44 +0200, Maxim Boguk wrote: > 2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0 > txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX > INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index > and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC. Another question on this. Are you running multiple reindexes concurrently? Other maintenance commands? Greetings, Andres Freund
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On 2021-11-08 17:04:40 +0600, Alexey Ermakov wrote: > I was told that application inserts rows into given table with queries like: > > INSERT INTO zz (id, body) VALUES (?, ?) ON CONFLICT (id) DO NOTHING > > inside transaction. It's possible to attempt to insert row with same id > several times inside one transaction (in that case usually that id is new > one i.e. was not presented in the table before that transaction). > > Hope it could help in some way. It did. I've managed to reproduce an issue like this. I'm still narrowing down the reproducer, but here's what I have so far: DROP TABLE IF EXISTS wide; CREATE TABLE wide(id serial primary key, wide text); pgbench -n -Mprepared -T1000 -f ~/tmp/insert_cci_bug.sql -c 16 -j 16 postgres in concurrent psql SELECT indrelid::regclass, indexrelid::regclass, bt_index_parent_check(indexrelid::regclass, true) FROM pg_index WHERE indrelid= (SELECT reltoastrelid FROM pg_class WHERE oid = 'wide'::regclass); REINDEX INDEX CONCURRENTLY pg_toast.pg_toast_331092_index; usually after 1-3 iterations the index is corrupt. Reproduces both in 13 and HEAD. Greetings, Andres Freund
Вложения
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On 2021-11-08 10:36:46 -0800, Andres Freund wrote: > DROP TABLE IF EXISTS wide; CREATE TABLE wide(id serial primary key, wide text); > pgbench -n -Mprepared -T1000 -f ~/tmp/insert_cci_bug.sql -c 16 -j 16 postgres > > in concurrent psql > SELECT indrelid::regclass, indexrelid::regclass, bt_index_parent_check(indexrelid::regclass, true) FROM pg_index WHEREindrelid = (SELECT reltoastrelid FROM pg_class WHERE oid = 'wide'::regclass); > REINDEX INDEX CONCURRENTLY pg_toast.pg_toast_331092_index; > > usually after 1-3 iterations the index is corrupt. Reproduces both in 13 and > HEAD. Oh, wow (or ugh). It narrows down quite a bit. A single pgbench session running INSERT INTO wide(wide) SELECT string_agg(g.i::text || random()::text, '') FROM generate_series(1, 500) g(i); triggers the issue. Sometimes takes quite a few iterations of REINDEX INDEX CONCURRENTLY, but... Greetings, Andres Freund PS: to trigger the reindex / validate I use cat /tmp/ri.sql \i /tmp/ri.sql SELECT 'REINDEX INDEX CONCURRENTLY '||indexrelid::regclass FROM pg_index WHERE indrelid = (SELECT reltoastrelid FROM pg_classWHERE oid = 'wide'::regclass) \gexec SELECT indrelid::regclass, indexrelid::regclass, bt_index_parent_check(indexrelid::regclass, true) FROM pg_index WHERE indrelid= (SELECT reltoastrelid FROM pg_class WHERE oid = 'wide'::regclass);
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On 2021-11-08 10:53:17 -0800, Andres Freund wrote: > Oh, wow (or ugh). It narrows down quite a bit. A single pgbench session > running > INSERT INTO wide(wide) SELECT string_agg(g.i::text || random()::text, '') FROM generate_series(1, 500) g(i); > > triggers the issue. Sometimes takes quite a few iterations of REINDEX INDEX > CONCURRENTLY, but... After several wrong turns, I found the cause of the problem: The problem is that toast_save_datum() releases the lock on the toast relation directly after inserting. That breaks the interlock between WaitForLockersMultiple() in ReindexRelationConcurrently() and the "toast inserter". The comments in ReindexRelationConcurrently() don't meaningfully explain the need for that interlock, but the equivalent case in DefineIndex() does: /* * Phase 2 of concurrent index build (see comments for validate_index() * for an overview of how this works) * * Now we must wait until no running transaction could have the table open * with the old list of indexes. Use ShareLock to consider running * transactions that hold locks that permit writing to the table. Note we * do not need to worry about xacts that open the table for writing after * this point; they will see the new index when they open it. * * Note: the reason we use actual lock acquisition here, rather than just * checking the ProcArray and sleeping, is that deadlock is possible if * one of the transactions in question is blocked trying to acquire an * exclusive lock on our table. The lock code will detect deadlock and * error out properly. */ WaitForLockers(heaplocktag, ShareLock, true); The problem that this causes is that the toast-inserting backend will not necessarily have seen the new index at this point (since there was no need to process relcache invalidations), but the scan below validate_index() will not necessarily find new tuples inserted after the start of the scan (particularly if they're in a new portion of the table). I verified that if I change the table_close() in toast_save_datum() to not release the lock the problem doesn't happen anymore. This problem doesn't exist for CREATE INDEX CONCURRENTLY afaict, because we don't allow that on system tables, including toast tables. But we *do* reindex toast indexes concurrently. One possible way to fix this would be to make ReindexRelationConcurrently() acquire a lock on the underlying table when reindexing a toast table. Another to not release the lock in toast_save_datum(). Greetings, Andres Freund
On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote: > One possible way to fix this would be to make ReindexRelationConcurrently() > acquire a lock on the underlying table when reindexing a toast table. Another > to not release the lock in toast_save_datum(). The latter is more future-proof. Does it have material disadvantages?
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote: >On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote: >> One possible way to fix this would be to make ReindexRelationConcurrently() >> acquire a lock on the underlying table when reindexing a toast table. Another >> to not release the lock in toast_save_datum(). > >The latter is more future-proof. Does it have material disadvantages? I don't immediately see any. But I've been long of the opinion, and had plenty discussions around it, that our habit of releasinglocks early is far too widely used. I do however wonder if there's other path to the problem, besides saving toast datums. We also release those locks earlyin other places, and while e.g. r/o locks won't cause a problem with this specific interlock, it could cause problemaround dropping the relation, for example. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Michael Paquier
Дата:
On Mon, Nov 08, 2021 at 08:37:58PM -0800, Andres Freund wrote: > On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote: >> On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote: >>> One possible way to fix this would be to make ReindexRelationConcurrently() >>> acquire a lock on the underlying table when reindexing a toast table. Another >>> to not release the lock in toast_save_datum(). Thanks for the test case. That reproduces really quickly. >> The latter is more future-proof. Does it have material disadvantages? > > I don't immediately see any. But I've been long of the opinion, and > had plenty discussions around it, that our habit of releasing locks > early is far too widely used. Yes, I'd agree that not patching the reindex concurrent path would be safer in the long run. This feels a bit like e629a01, in spirit, not in scope. > I do however wonder if there's other path to the problem, besides > saving toast datums. We also release those locks early in other > places, and while e.g. r/o locks won't cause a problem with this > specific interlock, it could cause problem around dropping the > relation, for example. Hmm. Perhaps there could be some latent issue around toast_delete_datum()? -- Michael
Вложения
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On 2021-11-09 14:02:19 +0900, Michael Paquier wrote: > On Mon, Nov 08, 2021 at 08:37:58PM -0800, Andres Freund wrote: > > On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote: > >> On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote: > >>> One possible way to fix this would be to make ReindexRelationConcurrently() > >>> acquire a lock on the underlying table when reindexing a toast table. Another > >>> to not release the lock in toast_save_datum(). > > Thanks for the test case. That reproduces really quickly. > > >> The latter is more future-proof. Does it have material disadvantages? > > > > I don't immediately see any. But I've been long of the opinion, and > > had plenty discussions around it, that our habit of releasing locks > > early is far too widely used. > > Yes, I'd agree that not patching the reindex concurrent path would be > safer in the long run. This feels a bit like e629a01, in spirit, not > in scope. I wonder if we should do both... Greetings, Andres Freund
On Mon, Nov 08, 2021 at 09:08:48PM -0800, Andres Freund wrote: > On 2021-11-09 14:02:19 +0900, Michael Paquier wrote: > > On Mon, Nov 08, 2021 at 08:37:58PM -0800, Andres Freund wrote: > > > On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote: > > >> On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote: > > >>> One possible way to fix this would be to make ReindexRelationConcurrently() > > >>> acquire a lock on the underlying table when reindexing a toast table. Another > > >>> to not release the lock in toast_save_datum(). > > >> The latter is more future-proof. Does it have material disadvantages? > > > > > > I don't immediately see any. But I've been long of the opinion, and > > > had plenty discussions around it, that our habit of releasing locks > > > early is far too widely used. > > > > Yes, I'd agree that not patching the reindex concurrent path would be > > safer in the long run. This feels a bit like e629a01, in spirit, not > > in scope. > > I wonder if we should do both... Doing both sounds harmless. Regarding REINDEX CONCURRENTLY on a system catalog, I bet that can still reach bugs even if we do both, considering this: $ git grep 'table_close.*, RowExclusiveLock' | wc -l 288
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Andres Freund
Дата:
Hi, On 2021-11-08 21:46:25 -0800, Noah Misch wrote: > Doing both sounds harmless. Regarding REINDEX CONCURRENTLY on a system > catalog, I bet that can still reach bugs even if we do both, considering this: Normal catalog aren't REINDEX CONCURRENTLY. It's just toast tables that are kind of system-y, but also kind of not, that can be reindexed concurrently. Or at least I think it's just toast indexes? Greetings, Andres Freund
On Mon, Nov 08, 2021 at 10:31:19PM -0800, Andres Freund wrote: > On 2021-11-08 21:46:25 -0800, Noah Misch wrote: > > Doing both sounds harmless. Regarding REINDEX CONCURRENTLY on a system > > catalog, I bet that can still reach bugs even if we do both, considering this: > > Normal catalog aren't REINDEX CONCURRENTLY. It's just toast tables that are > kind of system-y, but also kind of not, that can be reindexed concurrently. Oh, okay.
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Michael Paquier
Дата:
On Mon, Nov 08, 2021 at 10:42:17PM -0800, Noah Misch wrote: > On Mon, Nov 08, 2021 at 10:31:19PM -0800, Andres Freund wrote: >> On 2021-11-08 21:46:25 -0800, Noah Misch wrote: >>> Doing both sounds harmless. Regarding REINDEX CONCURRENTLY on a system >>> catalog, I bet that can still reach bugs even if we do both, considering this: >> >> Normal catalog aren't REINDEX CONCURRENTLY. It's just toast tables that are >> kind of system-y, but also kind of not, that can be reindexed concurrently. > > Oh, okay. To be clear on this point, users cannot reindex concurrently catalog indexes and toast indexes associated to catalog tables, just toast indexes of normal tables. I don't know if any of you have been working on a patch, but I was cooking something. It would be worth checking if an isolation test could be written. -- Michael
Вложения
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Michael Paquier
Дата:
On Thu, Nov 11, 2021 at 06:09:49PM +0900, Michael Paquier wrote: > To be clear on this point, users cannot reindex concurrently catalog > indexes and toast indexes associated to catalog tables, just toast > indexes of normal tables. I don't know if any of you have been > working on a patch, but I was cooking something. It would be worth > checking if an isolation test could be written. So, I have worked on this one. And attached is a patch that implements the two approaches suggested by Andres which are able to fix the issues discussed: 1) Switch to a session-level lock on the parent relation if doing a reindex concurrently on a toast table or on one of its indexes. This requires to look back at pg_class.reltoastrelid to find the correct parent. This stresses me quite a bit, and I am not sure that I like that to be honest because we don't do anything like that in the rest of the tree. I am also getting the feeling that this is an open door for more issues. 2) Don't release locks when a new toast value is saved until the end of its transaction. After more testing, I have been able to extract and write an isolation test that is able to reproduce the failure. It relies on a trick as the toast relation names are not deterministic, and we cannot use REINDEX CONCURRENTLY in a function context. So I have used an ALTER TABLE/INDEX RENAME with a DO block to change the toast relation names with allow_system_table_mods instead. There is no need either for amcheck with this method. 2) is enough to fix the problem, and I'd like to think that we had better stick with only this method for simplicity's sake. Comments? -- Michael
Вложения
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Michael Paquier
Дата:
On Mon, Nov 15, 2021 at 08:03:04PM +0900, Michael Paquier wrote: > After more testing, I have been able to extract and write an isolation > test that is able to reproduce the failure. It relies on a trick as > the toast relation names are not deterministic, and we cannot use > REINDEX CONCURRENTLY in a function context. So I have used an ALTER > TABLE/INDEX RENAME with a DO block to change the toast relation > names with allow_system_table_mods instead. There is no need either > for amcheck with this method. > > 2) is enough to fix the problem, and I'd like to think that we had > better stick with only this method for simplicity's sake. I have been working on this one again for the last couple of days, and I would still go with the simple solution, releasing the row-level toast locks only at the end of the transaction when saving and deleting a toast value. While testing, I have noticed that the deletion part is also important, as a REINDEX CONCURRENTLY run in parallel of a transaction removing a toast value would go through without that, rather than waiting for the transaction doing the deletion to commit first. I have expanded the tests with everything I could think about, so I'd like to commit the attached. The test is fancy with its use of allow_system_table_mods to make the toast relation names reliable, but it has been really useful. -- Michael
Вложения
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
"Bossart, Nathan"
Дата:
On 12/5/21, 5:54 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > I have been working on this one again for the last couple of days, and > I would still go with the simple solution, releasing the row-level > toast locks only at the end of the transaction when saving and > deleting a toast value. While testing, I have noticed that the > deletion part is also important, as a REINDEX CONCURRENTLY run in > parallel of a transaction removing a toast value would go through > without that, rather than waiting for the transaction doing the > deletion to commit first. I have expanded the tests with everything I > could think about, so I'd like to commit the attached. The test is > fancy with its use of allow_system_table_mods to make the toast > relation names reliable, but it has been really useful. I confirmed that the new tests reliably produce corruption and that the suggested fix resolves it. I also lean towards the simple solution, but I do wonder if it creates any interesting side effects. For example, could holding the locks longer impact performance? (Of course, performance is probably not a great reason to avoid a sustainable solution.) - toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock); - table_close(toastrel, RowExclusiveLock); + toast_close_indexes(toastidxs, num_indexes, NoLock); + table_close(toastrel, NoLock); I think it would be good to expand the comments above these changes to explain why we are keeping the lock. That might help avoid similar problems in the future. Nathan
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
Michael Paquier
Дата:
On Tue, Dec 07, 2021 at 10:04:54PM +0000, Bossart, Nathan wrote: > I confirmed that the new tests reliably produce corruption and that > the suggested fix resolves it. I also lean towards the simple > solution, but I do wonder if it creates any interesting side effects. Thanks. > For example, could holding the locks longer impact performance? Do you have anything particular in mind? It seems to me that this boils down to the same lock taken on the parent table based on its RTE. > - toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock); > - table_close(toastrel, RowExclusiveLock); > + toast_close_indexes(toastidxs, num_indexes, NoLock); > + table_close(toastrel, NoLock); > > I think it would be good to expand the comments above these changes to > explain why we are keeping the lock. That might help avoid similar > problems in the future. Yes, I have added a note, and applied the patch after looking at it again this morning. The test cannot be used in 12 so I have removed it from REL_12_STABLE, as allow_system_table_mods is a PGC_POSTMASTER there. -- Michael
Вложения
Re: BUG #17268: Possible corruption in toast index after reindex index concurrently
От
"Bossart, Nathan"
Дата:
On 12/7/21, 6:48 PM, "Michael Paquier" <michael@paquier.xyz> wrote: > On Tue, Dec 07, 2021 at 10:04:54PM +0000, Bossart, Nathan wrote: >> For example, could holding the locks longer impact performance? > > Do you have anything particular in mind? It seems to me that this > boils down to the same lock taken on the parent table based on its > RTE. I don't have anything concrete in mind. >> I think it would be good to expand the comments above these changes to >> explain why we are keeping the lock. That might help avoid similar >> problems in the future. > > Yes, I have added a note, and applied the patch after looking at it > again this morning. The test cannot be used in 12 so I have removed > it from REL_12_STABLE, as allow_system_table_mods is a PGC_POSTMASTER > there. Great! Nathan