Re: Checksum errors in pg_stat_database

Поиск

Список

Период

Сортировка

От	Drouvot, Bertrand
Тема	Re: Checksum errors in pg_stat_database
Дата	12 декабря 2022 г. 06:58:25
Msg-id	8dfb9df9-2a7b-d3be-df42-7cf5b4ca0f93@gmail.com обсуждение исходный текст
Ответ на	Re: Checksum errors in pg_stat_database (Michael Paquier <michael@paquier.xyz>)
Список	pgsql-hackers

Дерево обсуждения


On 12/12/22 12:40 AM, Michael Paquier wrote:
> On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote:
>> It would be less of a concern yes, but I think it still would be a concern.
>> If you have a large amount of corruption you could quickly get to millions
>> of rows to keep track of which would definitely be a problem in shared
>> memory as well, wouldn't it?
> 
> Yes.  I have discussed this item with Bertrand off-list and I share
> the same concern.  This would lead to an lot of extra workload on a
> large seqscan for a corrupted relation when the stats are written
> (shutdown delay) while bloating shared memory with potentially
> millions of items even if variable lists are handled through a dshash
> and DSM.
> 
>> But perhaps we could keep a list of "the last 100 checksum failures" or
>> something like that?
> 
> Applying a threshold is one solution.  Now, a second thing I have seen
> in the past is that some disk partitions were busted but not others,
> and the current database-level counters are not enough to make a
> difference when it comes to grab patterns in this area.  A list of the
> last N failures may be able to show some pattern, but that would be
> like analyzing things with a lot of noise without a clear conclusion.
> Anyway, the workload caused by the threshold number had better be
> measured before being decided (large set of relation files with a full
> range of blocks corrupted, much better if these are in the OS cache
> when scanned), which does not change the need of a benchmark.
> 
> What about just adding a counter tracking the number of checksum
> failures for relfilenodes 

Agree about your concern for tracking the corruption for every single block.
I like this idea for relfilenodes tracking instead. Indeed it looks like this is enough useful historical information
towork with.
 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Checksum errors in pg_stat_database