Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData()
От | Tom Lane |
---|---|
Тема | Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData() |
Дата | |
Msg-id | 21575.1511831661@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: BUG #14929: Unchecked AllocateDir() return value in restoreTwoPhaseData() (Michael Paquier <michael.paquier@gmail.com>) |
Список | pgsql-bugs |
Michael Paquier <michael.paquier@gmail.com> writes: > On Tue, Nov 28, 2017 at 12:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. twophase.c does this: >> cldir = AllocateDir(TWOPHASE_DIR); >> LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); >> while ((clde = ReadDir(cldir, TWOPHASE_DIR)) != NULL) >> which is flat out wrong because LWLockAcquire might well clobber errno. > I have not checked if it actually updates errno or not, but relying on > the fact that it may do it sucks. I think it might only change errno in some code paths, for example LWLockAcquire -> wait needed -> PGSemaphoreLock -> sem_wait gets interrupted -> EINTR. Which is the worst of all worlds, because the report would usually be right and only occasionally mislead you. >> There might be some other problems I missed in a quick scan. > I have spotted more problems. Ugh. > I think that this requires its own new thread with a more extended > analysis on -hackers to attract attention, Agreed. One thing I was thinking about is that there are places such as DeleteAllExportedSnapshotFiles and RelationCacheInitFileRemove that want the error message to come out only at LOG level, not ERROR, because they're running in the postmaster. But not only are they issuing a mostly-duplicative ereport call, but they're really not protecting themselves fully, because you'd probably want failures inside ReadDir to be reported at LOG level as well. So what this leads me to think is that we should export ReadDirExtended (currently that's only accessible inside fd.c) and then the use-case would be solvable with dir = AllocateDir(path);while ((dirent = ReadDirExtended(dir, path, LOG)) != NULL) process dirent;FreeDir(dir); This line of thinking also says that FreeDir needs to be tweaked to do nothing if dir == NULL, assuming that somebody should have logged the directory-open failure already. The other thing that really could use discussion, as you mentioned upthread, is how useful are custom error messages for AllocateDir failures. I'm not real sure for instance thatcould not open write-ahead log directory "pg_wal": ... is a useful improvement over a generic messagecould not open directory "pg_wal": ... Experts will know what pg_wal is anyway, and non-experts may not gain much from being told it's a write-ahead log directory. I'm prepared to listen to arguments that these custom messages (or at least some of them) are worth keeping, but I think someone ought to make that case, rather than letting them stand just because they're there now. regards, tom lane
В списке pgsql-bugs по дате отправления: