Re: Timeout failure in 019_replslot_limit.pl
От | Amit Kapila |
---|---|
Тема | Re: Timeout failure in 019_replslot_limit.pl |
Дата | |
Msg-id | CAA4eK1JHQEAfsxYqZDrToNiW8KAZ-bDKo-VtXQeR+nyMGF19vg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Timeout failure in 019_replslot_limit.pl (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Timeout failure in 019_replslot_limit.pl
|
Список | pgsql-hackers |
On Mon, Sep 27, 2021 at 11:32 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Sep 25, 2021 at 05:12:42PM +0530, Amit Kapila wrote: > > Now, in the failed run, it appears that due to some reason WAL sender > > has not released the slot. Is it possible to see if the WAL sender is > > still alive when a checkpoint is stuck at ConditionVariableSleep? And > > if it is active, what is its call stack? > > I got again a failure today, so I have used this occasion to check that > when the checkpoint gets stuck the WAL sender process getting SIGCONT > is still around, waiting for a write to happen: > * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP > frame #0: 0x00007fff20320c4a libsystem_kernel.dylib`kevent + 10 > frame #1: 0x000000010fe50a43 postgres`WaitEventSetWaitBlock(set=0x00007f884d80a690, cur_timeout=-1, occurred_events=0x00007ffee0395fd0,nevents=1) at latch.c:1601:7 > frame #2: 0x000000010fe4ffd0 postgres`WaitEventSetWait(set=0x00007f884d80a690, timeout=-1, occurred_events=0x00007ffee0395fd0,nevents=1, wait_event_info=100663297) at latch.c:1396:8 > frame #3: 0x000000010fc586c4 postgres`secure_write(port=0x00007f883eb04080, ptr=0x00007f885006a040, len=122694) atbe-secure.c:298:3 .. .. > frame #15: 0x000000010fe91eb8 postgres`PostgresMain(dbname="", username="mpaquier") at postgres.c:4493:12 > > It logs its FATAL "terminating connection due to administrator > command" coming from ProcessInterrupts(), and then it sits idle on > ClientWrite. > So, it seems on your machine it has passed the following condition in secure_write: if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN)) If so, this indicates write failure which seems odd to me and probably something machine-specific or maybe some different settings in your build or machine. BTW, if SSL or GSS is enabled that might have caused it in some way. I think the best way is to debug the secure_write during this occurrence. -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: