Re: FSM Corruption (was: Could not read block at end of the relation)

Поиск

Список

Период

Сортировка

От	Noah Misch
Тема	Re: FSM Corruption (was: Could not read block at end of the relation)
Дата	4 марта 2024 г. 19:03:12
Msg-id	20240304190312.b6.nmisch@google.com обсуждение исходный текст
Ответ на	Re: FSM Corruption (was: Could not read block at end of the relation) (Ronan Dunklau <ronan.dunklau@aiven.io>)
Ответы	Re: FSM Corruption (was: Could not read block at end of the relation)
Список	pgsql-bugs

Дерево обсуждения

On Mon, Mar 04, 2024 at 02:10:39PM +0100, Ronan Dunklau wrote:
> Le lundi 4 mars 2024, 00:47:15 CET Noah Misch a écrit :
> > On Tue, Feb 27, 2024 at 11:34:14AM +0100, Ronan Dunklau wrote:
> > > - happens during heavy system load
> > > - lots of concurrent writes happening on a table
> > > - often (but haven't been able to confirm it is necessary), a vacuum is
> > > running on the table at the same time the error is triggered

> Looking at when the corruption was WAL-logged, this particular case is quite 
> easy to trace. We have a few MULTI-INSERTS+INIT intiially loading the table 
> (probably a pg_restore), then, 2GB of WAL later, what looks like a VACUUM 
> running on the table: a succession of FPI_FOR_HINT, FREEZE_PAGE, VISIBLE xlog 
> records for each of the relation main fork, followed by a lonely FPI for the 
> leaf page of it's FSM:

You're using data_checksums, right?  Thanks for the wal dump excerpts; I agree
with this summary thereof.

> There are no traces of relation truncation happening in the WAL.

That is notable.

> This case only shows a single invalid entry in the FSM, but I've noticed as 
> much as 62 blocks present in the FSM while they do not exist on disk, all 
> tagged with MaxFSMRequestSize so I suppose something is wrong with the bulk 
> extension mechanism.

Is this happening after an OS crash, a replica promote, or a PITR restore?  If
so, I think I see the problem.  We have an undocumented rule that FSM shall
not contain references to pages past the end of the relation.  To facilitate
that, relation truncation WAL-logs FSM truncate.  However, there's no similar
protection for relation extension, which is not WAL-logged.  We break the rule
whenever we write FSM for block X before some WAL record initializes block X.
data_checksums makes the trouble easier to hit, since it creates FPI_FOR_HINT
records for FSM changes.  A replica promote or PITR ending just after the FSM
FPI_FOR_HINT would yield this broken state.  While v16 RelationAddBlocks()
made this easier to hit, I suspect it's reproducible in all supported
branches.  For example, lazy_scan_new_or_empty() and multiple index AMs break
the rule via RecordPageWithFreeSpace() on a PageIsNew() page.

I think the fix is one of:

- Revoke the undocumented rule.  Make FSM consumers resilient to the FSM
  returning a now-too-large block number.

- Enforce a new "main-fork WAL before FSM" rule for logged rels.  For example,
  in each PageIsNew() case, either don't update FSM or WAL-log an init (like
  lazy_scan_new_or_empty() does when PageIsEmpty()).

В списке pgsql-bugs по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: FSM Corruption (was: Could not read block at end of the relation)