Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
От | Michael Paquier |
---|---|
Тема | Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica |
Дата | |
Msg-id | Yl+y7kvCDR6x4V+E@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica (Andrey Borodin <x4mmm@yandex-team.ru>) |
Ответы |
Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
|
Список | pgsql-bugs |
On Tue, Mar 08, 2022 at 06:44:31PM +0300, Andrey Borodin wrote: > I tried to dig into this again. And once again I simply cannot > understand how it is supposed to work... I'd appreciate if someone > explained it to me. > When I do SELECT bt_index_check('idx',true) in fact I'm doing following: > 1. regclassin() lookups 'idx', converts it into oid without taking any lock > 2. bt_index_check() gets this oid and lookups index again. > Do we rely on catalog snapshot here? Or how do we know that this oid > is still of the index named 'idx' on standby? When backend will > understand that this oid is no more of the interest and get an error > somehow? I think it must happen during index_open(). Yeah, I can see that, and I think that it is a second issue, rather independent of the cache lookup errors that we are seeing on a standby. As far as I have tested, it is possible to see this error as well with a sequence of CREATE/DROP INDEX CONCURRENTLY run repeatedly on a primary with the standby doing a scan of pg_index, like that for example in a tight loop: SELECT bt_index_check(indexrelid) FROM pg_index WHERE indisvalid AND indexrelid > 16000; It is a matter of seconds to see bt_index_check() complain that one of the so-said index is not valid, but the scan of pg_index was told to not select such an index, so we are missing something with the relation cache. > BTW is anyone working on this bug? It took me some time to come back to this thread, but I do now that we are done with the commit fest business. Speaking of the cache errors when working directly on the relations rebuilt, I got a patch that seems to work properly, with a mix of standby snapshots and AELs logged. I have been running a set of REINDEX INDEX/TABLE CONCURRENTLY on the primary through more than 100k relfilenodes without the standby complaining, and I should be able to get this one fixed before the next minor release. -- Michael
Вложения
В списке pgsql-bugs по дате отправления: