Re: Checksum errors in pg_stat_database

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Checksum errors in pg_stat_database
Дата
Msg-id CABUevEx1Jv=kFW=6Kth_nEtbqP+heo1Okv8eRYD57MSzMKbE_A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Checksum errors in pg_stat_database  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers


On Mon, Dec 12, 2022 at 12:40 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote:
> It would be less of a concern yes, but I think it still would be a concern.
> If you have a large amount of corruption you could quickly get to millions
> of rows to keep track of which would definitely be a problem in shared
> memory as well, wouldn't it?

Yes.  I have discussed this item with Bertrand off-list and I share
the same concern.  This would lead to an lot of extra workload on a
large seqscan for a corrupted relation when the stats are written
(shutdown delay) while bloating shared memory with potentially
millions of items even if variable lists are handled through a dshash
and DSM.

> But perhaps we could keep a list of "the last 100 checksum failures" or
> something like that?

Applying a threshold is one solution.  Now, a second thing I have seen
in the past is that some disk partitions were busted but not others,
and the current database-level counters are not enough to make a
difference when it comes to grab patterns in this area.  A list of the
last N failures may be able to show some pattern, but that would be
like analyzing things with a lot of noise without a clear conclusion. 
Anyway, the workload caused by the threshold number had better be
measured before being decided (large set of relation files with a full
range of blocks corrupted, much better if these are in the OS cache
when scanned), which does not change the need of a benchmark.

What about just adding a counter tracking the number of checksum
failures for relfilenodes in a new structure related to them (note
that I did not write PgStat_StatTabEntry)?

If we do that, it is then possible to cross-check the failures with
tablespaces, which would point to disk areas that are more sensitive
to corruption.

If that's the concern, then perhaps the level we should be tracking things on is tablespace? We don't have any stats per tablespace today I believe, but that doesn't mean we couldn't create that.

--

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Takamichi Osumi (Fujitsu)"
Дата:
Сообщение: RE: Time delayed LR (WAS Re: logical replication restrictions)
Следующее
От: Antonin Houska
Дата:
Сообщение: Re: refactor ExecGrant_*() functions