F_SETLK is looking worse and worse...
От | Tom Lane |
---|---|
Тема | F_SETLK is looking worse and worse... |
Дата | |
Msg-id | 25154.975456988@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: F_SETLK is looking worse and worse...
Re: F_SETLK is looking worse and worse... |
Список | pgsql-hackers |
While testing interlocking of multiple postmasters, I discovered that the HAVE_FCNTL_SETLK interlock code we have in StreamServerPort() does not work at all on HPUX 10.20. This platform has F_SETLK according to configure, but: 1. The lock is never applied to a socket, because the open() on the newly-created socket (at line 303 of pqcomm.c) fails with EOPNOTSUPP, Operation not supported. 2. If a postmaster finds a socket file in its way, it is unable to remove it despite the lack of any lock, because the open() at line 230 fails with EADDRINUSE, Address already in use. I have no idea whether the fcntl(F_SETLK) call would succeed if control did get to it, but these results don't leave me very hopeful. Between this and the already-known result that F_SETLK doesn't work on sockets in shipping Linux kernels, I'm pretty unimpressed with the usefulness of this interlock method. We talked before about flushing the F_SETLK technique and using good old interlock files containing PIDs, same method that we use for interlocking the data directory. That is, if the socket file name is /tmp/.s.PGSQL.5432, we'd create a plain file /tmp/.s.PGSQL.5432.lock containing the owning process's PID. The code would insist on getting this interlock file first, and if successful would just unconditionally remove any existing socket file before doing the bind(). I can only think of one scenario where this is worse than what we have now: if someone is running a /tmp-directory-sweeper that is bright enough not to remove socket files, it would still zap the interlock file, thus potentially allowing a second postmaster to take over the socket file. This doesn't seem like a mainstream problem though. BTW, it also seems like a good idea to reorder the postmaster's startup operations so that the data-directory lockfile is checked before trying to acquire the port lockfile, instead of after. That way, in the common scenario where you're trying to start a second postmaster in the same directory + same port, it'd fail cleanly even if /tmp/.s.PGSQL.5432.lock had disappeared. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: