Re: Bug: walsender and high CPU usage
От | Heikki Linnakangas |
---|---|
Тема | Re: Bug: walsender and high CPU usage |
Дата | |
Msg-id | 4F5DD6EB.5050309@enterprisedb.com обсуждение исходный текст |
Ответ на | Bug: walsender and high CPU usage (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Bug: walsender and high CPU usage
|
Список | pgsql-hackers |
On 09.03.2012 13:40, Fujii Masao wrote: > I found the bug which causes walsender to enter into busy loop > when replication connection is terminated. Walsender consumes > lots of CPU resource (%sys), and this situation lasts until it has > detected the termination of replication connection and exited. > > The cause of this bug is that the walsender loop doesn't call > ResetLatch at all in the above case. Since the latch remains set, > the walsender loop cannot sleep on the latch, i.e., WaitLatch > always returns immediately. > > We can fix this bug by adding ResetLatch into the top of the > walsender loop. Patch attached. > > This bug exists in 9.1 but not in 9.2dev. In 9.2dev, this bug has > already been fixed by the commit > (cff75130b5f63e45423c2ed90d6f2e84c21ef840). This commit > refactors and refines the walsender loop logic in addition to > adding ResetLatch. So I'm tempted to backport this commit > (except the deletion of wal_sender_delay) to 9.1 rather than > applying the attached patch. OTOH, attached patch is quite simple, > and its impact on 9.1 would be very small, so it's easy to backport that. > Thought? This patch makes the code that follows bogus: > /* > * If we don't have any pending data in the output buffer, try to send > * some more. > */ > if (!pq_is_send_pending()) > { > XLogSend(output_message, &caughtup); > > /* > * Even if we wrote all the WAL that was available when we started > * sending, more might have arrived while we were sending this > * batch. We had the latch set while sending, so we have not > * received any signals from that time. Let's arm the latch again, > * and after that check that we're still up-to-date. > */ > if (caughtup && !pq_is_send_pending()) > { > ResetLatch(&MyWalSnd->latch); > > XLogSend(output_message, &caughtup); > } > } The comment is no longer valid, and the calls to ResetLatch and XLogSend are no longer necessary, once you have the ResetLatch() call at the top of the loop. I also think we should backport commit cff75130b5f63e45423c2ed90d6f2e84c21ef840, except for the removal of wal_sender_delay). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: