Re: silent data loss with ext4 / all current versions

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: silent data loss with ext4 / all current versions
Дата
Msg-id CAB7nPqRDwya5Qj4w0b5mu9Eac4AS1ykUU+WD_7z0zip0HsOhQg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: silent data loss with ext4 / all current versions  (Andres Freund <andres@anarazel.de>)
Ответы Re: silent data loss with ext4 / all current versions  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Tue, Mar 8, 2016 at 2:55 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-03-08 12:26:34 +0900, Michael Paquier wrote:
>> On Tue, Mar 8, 2016 at 12:18 PM, Andres Freund <andres@anarazel.de> wrote:
>> > On 2016-03-08 12:01:18 +0900, Michael Paquier wrote:
>> >> I have spent a couple of hours looking at that in details, and the
>> >> patch is neat.
>> >
>> > Cool. Doing some more polishing right now. Will be back with an updated
>> > version soonish.
>> >
>> > Did you do some testing?
>>
>> Not much in details yet, I just ran a check-world with fsync enabled
>> for the recovery tests, plus quick manual tests with a cluster
>> manually set up. I'll do more with your new version now that I know
>> there will be one.
>
> Here's my updated version.
>
> Note that I've split the patch into two. One for the infrastructure, and
> one for the callsites.

Thanks for the updated patches and the split, this makes things easier
to look at. I have been doing some testing as well mainly manually
using with pgbench and nothing looks broken.

+   durable_link_or_rename(tmppath, path, ERROR);
+   durable_rename(path, xlogfpath, ERROR);
You may want to add a (void) cast in front of those calls for correctness.

-       ereport(LOG,
-               (errcode_for_file_access(),
-                errmsg("could not link file \"%s\" to \"%s\"
(initialization of log file): %m",
-                       tmppath, path)));
We lose a portion of the error message here, but with the file name
that's easy to guess where that is happening. I am not complaining
(that's fine to me as-is), just mentioning for the archive's sake.

>> >> +   /* XXX: Add racy file existence check? */
>> >> +   if (rename(oldfile, newfile) < 0)
>> >
>> >> I am not sure we should worry about that, what do you think could
>> >> cause the old file from going missing all of a sudden. Other backend
>> >> processes are not playing with it in the code paths where this routine
>> >> is called. Perhaps adding a comment in the header to let users know
>> >> that would help?
>> >
>> > What I'm thinking of is adding a check whether the *target* file already
>> > exists, and error out in that case. Just like the link() based path
>> > normally does.
>>
>> Ah, OK. Well, why not. I'd rather have an assertion instead of an error though.
>
> I think it should definitely be an error if anything. But I'd rather
> only add it in master...

I guess I know why :) That's also why I was thinking about an assertion.
-- 
Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Using quicksort for every external sort run
Следующее
От: Andres Freund
Дата:
Сообщение: Re: silent data loss with ext4 / all current versions