shmctl portability problem
От | Tom Lane |
---|---|
Тема | shmctl portability problem |
Дата | |
Msg-id | 1896.1010108873@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: shmctl portability problem
|
Список | pgsql-hackers |
After a system crash on a RH 7.2 box (2.4.7-10 kernel), I found that Postgres would not restart, complaining that it "found a pre-existing shared memory block (ID so-and-so) still in use." This is coming from code that attempts to defend against the scenario where the postmaster crashed but one or more backends are still alive. If we start a new postmaster and create a new shmem segment, the consequences will be absolutely disastrous, because the old and new backends will be modifying the same data files with no coordination. So we look to see if the old shmem segment (whose ID is recorded in the data directory lockfile) is still present and if so whether there are any processes attached to it. See SharedMemoryIsInUse() in src/backend/storage/ipc/ipc.c. The problem is that SharedMemoryIsInUse() expects shmctl to return errno == EINVAL if the presented shmem segment ID is invalid. What Linux 2.4.7 is actually returning is EIDRM (identifier removed). The easy "fix" of taking EIDRM to be an allowable return code scares me. At least on HPUX, the documented implication of this return code is that the shmem segment is marked for deletion but is not yet gone because there are still processes attached to it. That would be exactly the scenario after a postmaster crash and manual "ipcrm" if there were any old backends still alive. So, it seems to me that accepting EIDRM would defeat the entire point of this test, at least on some platforms. Comments? Is 2.4.7 simply broken and returning the wrong errno? If not, what should we do? regards, tom lane
В списке pgsql-hackers по дате отправления: