Обсуждение: pg_stat_archiver issue with aborted archiver
Hello, I just noticed that if the archiver aborts (for instance if the archive_command exited with a return code > 127), pg_stat_archiver won't report those failed attempts. This happens with both 9.4 and 9.5 branches. Please find attached a patch that fix this issue, based on current head. Regards. -- Julien Rouhaud http://dalibo.com - http://dalibo.org
Вложения
On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > I just noticed that if the archiver aborts (for instance if the > archive_command exited with a return code > 127), pg_stat_archiver won't > report those failed attempts. This happens with both 9.4 and 9.5 branches. > > Please find attached a patch that fix this issue, based on current head. The current code seems right to me. When the archive command dies because of a signal (exit code > 128), the server should fail immediately with FATAL and should not do any extra processing. It will also try to archive again the same segment file after restart. When trying again, if this time the failure is not caused by a signal but still fails it will be reported to pg_stat_archiver. -- Michael
Le 08/06/2015 05:56, Michael Paquier a écrit : > On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud > <julien.rouhaud@dalibo.com> wrote: >> I just noticed that if the archiver aborts (for instance if the >> archive_command exited with a return code > 127), >> pg_stat_archiver won't report those failed attempts. This happens >> with both 9.4 and 9.5 branches. >> >> Please find attached a patch that fix this issue, based on >> current head. > > The current code seems right to me. When the archive command dies > because of a signal (exit code > 128), the server should fail > immediately with FATAL and should not do any extra processing. Ok. It may be worth to document it though. > It will also try to archive again the same segment file after > restart. When trying again, if this time the failure is not caused > by a signal but still fails it will be reported to > pg_stat_archiver. > Yes, my comment was only about the failure not reported in some special cases. Thank for your response. -- Julien Rouhaud http://dalibo.com - http://dalibo.org
On Mon, Jun 8, 2015 at 5:17 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > Le 08/06/2015 05:56, Michael Paquier a écrit : >> On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud >> <julien.rouhaud@dalibo.com> wrote: >>> I just noticed that if the archiver aborts (for instance if the >>> archive_command exited with a return code > 127), >>> pg_stat_archiver won't report those failed attempts. This happens >>> with both 9.4 and 9.5 branches. >>> >>> Please find attached a patch that fix this issue, based on >>> current head. >> >> The current code seems right to me. When the archive command dies >> because of a signal (exit code > 128), the server should fail >> immediately with FATAL and should not do any extra processing. In that case, ISTM that the archiver process dies with FATAL but the server not. No? Then the archiver is restarted by postmaster. If my understanding is right, it seems worth applying something like Julien's patch. Regards, -- Fujii Masao
On Tue, Jun 9, 2015 at 4:23 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jun 8, 2015 at 5:17 PM, Julien Rouhaud > <julien.rouhaud@dalibo.com> wrote: >> Le 08/06/2015 05:56, Michael Paquier a écrit : >>> On Sun, Jun 7, 2015 at 1:11 AM, Julien Rouhaud >>> <julien.rouhaud@dalibo.com> wrote: >>>> I just noticed that if the archiver aborts (for instance if the >>>> archive_command exited with a return code > 127), >>>> pg_stat_archiver won't report those failed attempts. This happens >>>> with both 9.4 and 9.5 branches. >>>> >>>> Please find attached a patch that fix this issue, based on >>>> current head. >>> >>> The current code seems right to me. When the archive command dies >>> because of a signal (exit code > 128), the server should fail >>> immediately with FATAL and should not do any extra processing. > > In that case, ISTM that the archiver process dies with FATAL but > the server not. No? Then the archiver is restarted by postmaster. > If my understanding is right, it seems worth applying something like > Julien's patch. Er, sure. Please understand the archiver process... My point is that 3ad0728 introduced the behavior that we have now in pgarch.c, and that we should immediately bail out from the archiver process without interacting with pgstat, the archiver coming back to this file archiving at restart, and only use pgstat_send_archiver when there is a status from pgarch_archiveXlog(). -- Michael