Re: unable to fail over to warm standby server
От | Heikki Linnakangas |
---|---|
Тема | Re: unable to fail over to warm standby server |
Дата | |
Msg-id | 4B615DA2.3040306@enterprisedb.com обсуждение исходный текст |
Ответ на | unable to fail over to warm standby server (Mason Hale <mason@onespot.com>) |
Ответы |
Re: unable to fail over to warm standby server
|
Список | pgsql-bugs |
Mason Hale wrote: > ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not > permittedtrigger file found > > ERROR: could not remove "/tmp/pgsql.trigger.5432": Operation not permitted > > This file was not looked until after the attempt to recover was > aborted. Clearly the permissions on /tmp/pgsql.trigger.5432 were a > problem, > but we don't see how that would explain the error messages, which seem > to indicate that data on the standby server was corrupted. Yes, that permission problem seems to be the root cause of the troubles. If pg_standby fails to remove the trigger file, it exit()s with whatever return code the unlink() call returned: > /* > * If trigger file found, we *must* delete it. Here's why: When > * recovery completes, we will be asked again for the same file from > * the archive using pg_standby so must remove trigger file so we can > * reload file again and come up correctly. > */ > rc = unlink(triggerPath); > if (rc != 0) > { > fprintf(stderr, "\n ERROR: could not remove \"%s\": %s", triggerPath, strerror(errno)); > fflush(stderr); > exit(rc); > } unlink() returns -1 on error, so pg_standby calls exit(-1). -1 is out of the range of normal return codes, and apparently gets mangled into the mysterious 65280 code you saw in the logs. The server treats that as a fatal error, and dies. That seems like a bug in pg_standby, but I'm not sure what it should do if the unlink() fails. It could exit with some other exit code, so that the server wouldn't die, but the lingering trigger file could cause problems, as the comment explains. If it should indeed cause FATAL, it should do so in a more robust way than the exit(rc) call above. BTW, this changed in PostgreSQL 8.4; pg_standby no longer tries to delete the trigger file (so that problematic block of code is gone), but there's a new restore_end_command option in recovery.conf instead, where you're supposed to put 'rm <triggerfile>'. I think in that configuration, the standby would've started up, even though removal of the trigger file would've still failed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-bugs по дате отправления: