Re: new heapcheck contrib module
От | Robert Haas |
---|---|
Тема | Re: new heapcheck contrib module |
Дата | |
Msg-id | CA+TgmoYTDcf5MJrSBCSB6iLnGzh4pE7nCBBVBYGP-7D0CwzuHw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: new heapcheck contrib module (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: new heapcheck contrib module
|
Список | pgsql-hackers |
On Wed, May 13, 2020 at 5:33 PM Peter Geoghegan <pg@bowt.ie> wrote: > Do you recall seeing corruption resulting in segfaults in production? I have seen that, I believe. I think it's more common to fail with errors about not being able to palloc>1GB, not being able to look up an xid or mxid, etc. but I am pretty sure I've seen multiple cases involving seg faults, too. Unfortunately for my credibility, I can't remember the details right now. > I personally don't recall seeing that. If it happened, the segfaults > themselves probably wouldn't be the main concern. I don't really agree. Hypothetically speaking, suppose you corrupt your only copy of a critical table in such a way that every time you select from it, the system seg faults. A user in this situation might ask questions like: 1. How did my table get corrupted? 2. Why do I only have one copy of it? 3. How do I retrieve the non-corrupted portion of my data from that table and get back up and running? In the grand scheme of things, #1 and #2 are the most important questions, but when something like this actually happens, #3 tends to be the most urgent question, and it's a lot harder to get the uncorrupted data out if the system keeps crashing. Also, a seg fault tends to lead customers to think that the database has a bug, rather than that the database is corrupted. Slightly off-topic here, but I think our error reporting in this area is pretty lame. I've learned over the years that when a customer reports that they get a complaint about a too-large memory allocation every time they access a table, they've probably got a corrupted varlena header. However, that's extremely non-obvious to a typical user. We should try to report errors indicative of corruption in a way that gives the user some clue that corruption has happened. Peter made a stab at improving things there by adding errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of users will never see the error code, only the message, and a lot of corruption produces still produces errors that weren't changed by that commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: