Re: Checksum errors in pg_stat_database
От | Drouvot, Bertrand |
---|---|
Тема | Re: Checksum errors in pg_stat_database |
Дата | |
Msg-id | 8dfb9df9-2a7b-d3be-df42-7cf5b4ca0f93@gmail.com обсуждение исходный текст |
Ответ на | Re: Checksum errors in pg_stat_database (Michael Paquier <michael@paquier.xyz>) |
Список | pgsql-hackers |
On 12/12/22 12:40 AM, Michael Paquier wrote: > On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote: >> It would be less of a concern yes, but I think it still would be a concern. >> If you have a large amount of corruption you could quickly get to millions >> of rows to keep track of which would definitely be a problem in shared >> memory as well, wouldn't it? > > Yes. I have discussed this item with Bertrand off-list and I share > the same concern. This would lead to an lot of extra workload on a > large seqscan for a corrupted relation when the stats are written > (shutdown delay) while bloating shared memory with potentially > millions of items even if variable lists are handled through a dshash > and DSM. > >> But perhaps we could keep a list of "the last 100 checksum failures" or >> something like that? > > Applying a threshold is one solution. Now, a second thing I have seen > in the past is that some disk partitions were busted but not others, > and the current database-level counters are not enough to make a > difference when it comes to grab patterns in this area. A list of the > last N failures may be able to show some pattern, but that would be > like analyzing things with a lot of noise without a clear conclusion. > Anyway, the workload caused by the threshold number had better be > measured before being decided (large set of relation files with a full > range of blocks corrupted, much better if these are in the OS cache > when scanned), which does not change the need of a benchmark. > > What about just adding a counter tracking the number of checksum > failures for relfilenodes Agree about your concern for tracking the corruption for every single block. I like this idea for relfilenodes tracking instead. Indeed it looks like this is enough useful historical information towork with. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: