(re)start in our init scripts seems broken
От | Tomas Vondra |
---|---|
Тема | (re)start in our init scripts seems broken |
Дата | |
Msg-id | f1b181e5-8b33-807d-a276-f7cedb09e32e@2ndquadrant.com обсуждение исходный текст |
Ответы |
Re: (re)start in our init scripts seems broken
|
Список | pgsql-hackers |
Hi, A few days ago I ran into a problem with the init script packaged in our community RPM packages. What happened was that they initiated a restart, but this happened: # /etc/init.d/postgresql-9.3 restart Stopping postgresql-9.3 service: [FAILED] Starting postgresql-9.3 service: [ OK ] The database was however still in the shutdown mode, performing a checkpoint. Sadly the init script uses default timeout, so the stop terminates after just 60 seconds. But that seems fine, as the init script reports the failure correctly. However the start action then seemingly succeeds, because it does this: echo -n "$PSQL_START" $SU -l postgres -c "$PGENGINE/postmaster -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG" 2>&1 < /dev/null sleep 2 pid=`head -n 1 "$PGDATA/postmaster.pid" 2>/dev/null` if [ "x$pid" != x ] then success"$PSQL_START" touch "$lockfile" echo $pid > "$pidfile" echo else failure"$PSQL_START" echo script_result=1 fi It simply attempts to start the postmaster directly (instead of using pg_ctl), does not check the return code and instead proceeds to check the postmaster.pid file and existence of the process. This however fails to do the trick, because the database is still running (in shutdown), so the postmaster does not overwrite the file. And of course the PID still matches a running process. Is there a reason why it's coded like this? I think we should use the pg_ctl instead or (at the very least) check the postmaster return code. Also, perhaps we should add an explicit timeout, higher than 60 seconds. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: