Re: Checking for missing heap/index files

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Checking for missing heap/index files
Дата
Msg-id 2683416.1666123190@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Checking for missing heap/index files  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Checking for missing heap/index files  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 18, 2022 at 2:37 PM Stephen Frost <sfrost@snowman.net> wrote:
>> I don't see it as likely to be acceptable, but arranging to not add or
>> remove files while the scan is happening would presumably eliminate the
>> risk entirely.  We've not seen this issue recur in the expire command
>> since the change to first completely scan the directory and then go and
>> remove the files from it.  Perhaps just not removing files during the
>> scan would be sufficient which might be more reasonable to do.

> Just deciding to cache to the results of readdir() in memory is much
> cheaper insurance. I think I'd probably be willing to foist that
> overhead onto everyone, all the time. As I mentioned before, it could
> still hose someone who is right on the brink of a memory disaster, but
> that's a much narrower blast radius than putting locking around all
> operations that create or remove a file in the same directory as a
> relation file. But it's also not a complete fix, which sucks.

Yeah, that.  I'm not sure if we need to do anything about this, but
if we do, I don't think that's it.  Agreed that the memory-consumption
objection is pretty weak; the one that seems compelling is that by
itself, this does nothing to fix the problem beyond narrowing the
window some.

Isn't it already the case (or could be made so) that relation file
removal happens only in the checkpointer?  I wonder if we could
get to a situation where we can interlock file removal just by
commanding the checkpointer to not do it for awhile.  Then combining
that with caching readdir results (to narrow the window in which we
have to stop the checkpointer) might yield a solution that has some
credibility.  This scheme doesn't attempt to prevent file creation
concurrently with a readdir, but you'd have to make some really
adverse assumptions to believe that file creation would cause a
pre-existing entry to get missed (as opposed to getting scanned
twice).  So it might be an acceptable answer.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Donghang Lin
Дата:
Сообщение: Re: Bug: pg_regress makefile does not always copy refint.so
Следующее
От: Corey Huinker
Дата:
Сообщение: Re: Getting rid of SQLValueFunction