Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
От | Craig Ringer |
---|---|
Тема | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS |
Дата | |
Msg-id | CAMsr+YH8JP-UdsGt0dLMcDRx6WQ78BZA7kMgimu8+ZuB_uzyFQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Thomas Munro <thomas.munro@enterprisedb.com>) |
Список | pgsql-hackers |
On 29 March 2018 at 20:07, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
-- On Thu, Mar 29, 2018 at 6:58 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 28 March 2018 at 11:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Craig Ringer <craig@2ndquadrant.com> writes:
>> > TL;DR: Pg should PANIC on fsync() EIO return.
>>
>> Surely you jest.
>
> No. I'm quite serious. Worse, we quite possibly have to do it for ENOSPC as
> well to avoid similar lost-page-write issues.
I found your discussion with kernel hacker Jeff Layton at
https://lwn.net/Articles/718734/ in which he said: "The stackoverflow
writeup seems to want a scheme where pages stay dirty after a
writeback failure so that we can try to fsync them again. Note that
that has never been the case in Linux after hard writeback failures,
AFAIK, so programs should definitely not assume that behavior."
The article above that says the same thing a couple of different ways,
ie that writeback failure leaves you with pages that are neither
written to disk successfully nor marked dirty.
If I'm reading various articles correctly, the situation was even
worse before his errseq_t stuff landed. That fixed cases of
completely unreported writeback failures due to sharing of PG_error
for both writeback and read errors with certain filesystems, but it
doesn't address the clean pages problem.
Yeah, I see why you want to PANIC.
In more ways than one ;)
> I'm not seeking to defend what the kernel seems to be doing. Rather, saying
> that we might see similar behaviour on other platforms, crazy or not. I
> haven't looked past linux yet, though.
I see no reason to think that any other operating system would behave
that way without strong evidence... This is openly acknowledged to be
"a mess" and "a surprise" in the Filesystem Summit article. I am not
really qualified to comment, but from a cursory glance at FreeBSD's
vfs_bio.c I think it's doing what you'd hope for... see the code near
the comment "Failed write, redirty."
Ok, that's reassuring, but doesn't help us on the platform the great majority of users deploy on :(
"If on Linux, PANIC"
Hrm.
В списке pgsql-hackers по дате отправления: