Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
От | Fujii Masao |
---|---|
Тема | Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown |
Дата | |
Msg-id | CAHGQGwGvfRW+hYLLOSM2CP-mg7qQFmu+GCdfiu9_1AKWdpxMdw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: BUG #9118: WAL Sender does not disconnect replication
clients during shutdown
|
Список | pgsql-bugs |
Sorry for the delay... On Thu, Feb 6, 2014 at 5:05 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 02/06/2014 05:08 AM, jhedden@apple.com wrote: >> >> The following bug has been logged on the website: >> >> Bug reference: 9118 >> Logged by: Joel Hedden >> Email address: jhedden@apple.com >> PostgreSQL version: 9.3.2 >> Operating system: Mac OS X 10.9.1 >> Description: >> >> I connect a pg_receivexlog instance and have "hot_standby" archiving >> enabled, with "archive_command" defined correctly. When the WAL Sender >> process receives a SIGUSR2 from the postmaster (or me), it fails to shut >> down and pg_receivexlog remains connected. Upon inspection, it looks like >> the test for "sentPtr == MyWalSnd->flush" is always false at >> walsender.c:1058 (sentPtr is still non-zero) where the wal sender should >> be >> shutting down. Replication and archiving seem to be working otherwise. >> Killing pg_receivexlog allows for the WAL Sender to terminate. > > > Hmm. Before exiting, walsender waits until the client has flushed all the > WAL to disk. However, pg_receivexlog never sends a "flush" pointer back to > the server, so the server waits forever. > > The first question is, why does pg_receivexlog not send its "flush" pointer > back to the server? It *does* fsync the files to disk. However, currently it > only fsyncs when closing a full segment, but when shutting down, the last > segment would not be full, so to fix this issue it should be taught to fsync > also partial segments. Yes. And, pg_receivexlog returns InvalidXLogRecPtr as the flush location, so "sentPtr == MyWalSnd->flush" will never be true when using pg_receivexlog... The quick-fix seems not to wait for that condition to be true whenever the flush location is invalid. > Fujii-san, how can walreceiver detect the closure of the connection, before > reading all the buffered WAL from the TCP connection? What kind of log > messages do you get when it happens? I got the following messages. [MASTER] LOG: database system is shut down [STANDBY] FATAL: could not send data to WAL stream: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. > I tried to reproduce that with commit > bee4a4d361c054c531c3a27024f9ff3efef3635b reverted, but couldn't. Although > this was with master and standby running on same laptop, and this is > essentially a race condition, so it's possible that I just didn't get the > timing right to make it happen. You would need to enable WAL archiving. Whenever I was able to reproduce the problem, I enabled WAL archiving. Regards, -- Fujii Masao
В списке pgsql-bugs по дате отправления: