Re: Checksum errors in pg_stat_database
От | Magnus Hagander |
---|---|
Тема | Re: Checksum errors in pg_stat_database |
Дата | |
Msg-id | CABUevExGXxStJaM0hLQY_kht_S3HnszgVH1=zk0xcx5ccz7tBQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Checksum errors in pg_stat_database ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>) |
Ответы |
Re: Checksum errors in pg_stat_database
|
Список | pgsql-hackers |
On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote:
On 4/2/19 7:06 PM, Magnus Hagander wrote:
> On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:
>
> On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote:
> > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz <mailto:michael@paquier.xyz>> wrote:
> >> One thing which is not
> >> proposed on this patch, and I am fine with it as a first draft, is
> >> that we don't have any information about the broken block number and
> >> the file involved. My gut tells me that we'd want a separate view,
> >> like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
> >> blck) to be complete. But that's just for future work.
> >
> > That could indeed be nice.
>
> Actually, backpedaling on this one... pg_stat_checksums_details may
> be a bad idea as we could finish with one row per broken block. If
> a corruption is spreading quickly, pgstat would not be able to sustain
> that amount of objects. Having pg_stat_checksums would allow us to
> plugin more data easily based on the last failure state:
> - last relid of failure
> - last fork type of failure
> - last block number of failure.
> Not saying to do that now, but having that in pg_stat_database does
> not seem very natural to me. And on top of that we would have an
> extra row full of NULLs for shared objects in pg_stat_database if we
> adopt the unique view approach... I find that rather ugly.
>
>
> I think that tracking each and every block is of course a non-starter, as you've noticed.
I think that's less of a concern now that the stats collector process has gone and that the stats are now collected in shared memory, what do you think?
It would be less of a concern yes, but I think it still would be a concern. If you have a large amount of corruption you could quickly get to millions of rows to keep track of which would definitely be a problem in shared memory as well, wouldn't it?
But perhaps we could keep a list of "the last 100 checksum failures" or something like that?
В списке pgsql-hackers по дате отправления: