Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
От | Noah Misch |
---|---|
Тема | Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid |
Дата | |
Msg-id | 20180812064815.GB2301738@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: Weaker shmem interlock w/o postmaster.pid (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid
(Dmitry Dolgov <9erthalion6@gmail.com>)
Re: [HACKERS] Weaker shmem interlock w/o postmaster.pid (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Список | pgsql-hackers |
On Wed, Sep 11, 2013 at 10:28:00PM -0400, Noah Misch wrote: > On Wed, Sep 11, 2013 at 11:32:01AM -0400, Stephen Frost wrote: > > * Noah Misch (noah@leadboat.com) wrote: > > > Concretely, that means > > > not removing postmaster.pid on immediate shutdown in 9.3 and earlier. That's > > > consistent with the rough nature of an immediate shutdown, anyway. With 9.3 having a few months left, that's less interesting, but ... > > I don't like leaving the postmaster.pid file around, even on an > > immediate shutdown. I don't have any great suggestions regarding what > > to do, given what we try to do wrt 'immediate', so perhaps it's > > acceptable for future releases. > > Fair enough. ... we could still worry about folks doing "kill -9 `head -n1 postmaster.pid` && rm postmaster.pid". A possible defense is to record a copy of the shm identifiers in a second file ("sysv_shmem_key"?) within the data directory. Since users would have no expectations about a new file's lifecycle, we could remove it when and only when we verify a segment doesn't exist. However, a DBA determined to remove postmaster.pid would likely remove both files. > > > I'm thinking to preserve postmaster.pid at immediate shutdown in all released > > > versions, but I'm less sure about back-patching a change to make > > > PGSharedMemoryCreate() pickier. On the one hand, allowing startup to proceed > > > with backends still active in the same data directory is a corruption hazard. > > > > The corruption risk, imv anyway, is sufficient to backpatch the change > > and overrides the concerns around very fast shutdown/restarts. > > Making PGSharedMemoryCreate() pickier in all branches will greatly diminish > the marginal value of preserving postmaster.pid, so I'm fine with dropping the > postmaster.pid side of the proposal. Here's an implementation. The first, small patch replaces an obsolete errhint() about manually removing shared memory blocks. In its place, recommend killing processes. Today's text, added in commit 4d14fe0, is too rarely pertinent to justify a HINT. (You might use ipcrm if a PostgreSQL process is stuck in D state and has SIGKILL pending, so it will never become runnable again.) Removal of the ipcclean script ten years ago (39627b1) and its non-replacement corroborate that manual shm removal is now a niche goal. In the main patch, one of the tests covers an unsafe case that sysv_shmem.c still doesn't detect. I could pursue a fix via the aforementioned sysv_shmem_key file, modulo the possibility of a DBA removing it. I could also, when postmaster.pid is missing, make sysv_shmem.c check the first N (N=100?) keys applicable to the selected port. My gut feeling is that neither thing is worthwhile, but I'm interested to hear other opinions. This removes the creatorPID test, which commit c715fde added before commit 4d14fe0 added the shm_nattch test. The pair of tests is not materially stronger than the shm_nattch test by itself. For reasons explained in the comment at "enum IpcMemoryState", this patch has us cease recycling segments pertaining to data directories other than our own. This isn't ideal for the buildfarm, where a postmaster crash bug would permanently leak a shm segment. I think the benefits for production use cases eclipse this drawback. Did I miss a reason to like the old behavior? Single-user mode has been protected even less than normal execution, because it selected a shm key as though using port zero. Thus, starting single-user mode after an incomplete shutdown would never detect the pre-shutdown shm. The patch makes single-user mode work like normal execution. Until commit de98a7e, single-user mode used malloc(), not a shm segment. That commit also restricted single-user mode to not recycle old segments. I didn't find a rationale for that restriction, but a contributing factor may have been the fact that every single-user process uses the same port-0 shm key space. Also, initdb starts various single-user backends, and reaping stale segments would be an odd side effect for initdb. With single-user mode using a real port number and PostgreSQL no longer recycling segments of other data directories, I removed that restriction. By the way, as of this writing, my regular development system has 41 stale segments, all in the single-user key space (0x00000001 to 0x0000002d with gaps). It's clear how they persisted once leaked, but I don't know how I leaked them in the first place. I ran 9327 iterations of the test file, and it did not leak a shm segment in that time. I tested on Red Hat and on Windows Server 2016; I won't be shocked if the test (not the code under test) breaks on other Windows configurations. The patch adds PostgresNode features to enable that test coverage. The comment referring to a CreateDataDirLockFile() bug refers to this one: https://postgr.es/m/flat/20120803145635.GE9683%40tornado.leadboat.com Thanks, nm
Вложения
В списке pgsql-hackers по дате отправления: