Re: Inconsistent DB data in Streaming Replication
От | Amit Kapila |
---|---|
Тема | Re: Inconsistent DB data in Streaming Replication |
Дата | |
Msg-id | 004d01ce3b55$8ab72380$a0256a80$@kapila@huawei.com обсуждение исходный текст |
Ответ на | Re: Inconsistent DB data in Streaming Replication (Florian Pflug <fgp@phlo.org>) |
Ответы |
Re: Inconsistent DB data in Streaming Replication
|
Список | pgsql-hackers |
On Monday, April 15, 2013 1:02 PM Florian Pflug wrote: > On Apr14, 2013, at 17:56 , Fujii Masao <masao.fujii@gmail.com> wrote: > > At fast shutdown, after walsender sends the checkpoint record and > > closes the replication connection, walreceiver can detect the close > > of connection before receiving all WAL records. This means that, > > even if walsender sends all WAL records, walreceiver cannot always > > receive all of them. > > That sounds like a bug in walreceiver to me. > > The following code in walreceiver's main loop looks suspicious: > > /* > * Process the received data, and any subsequent data we > * can read without blocking. > */ > for (;;) > { > if (len > 0) > { > /* Something was received from master, so reset timeout */ > ... > XLogWalRcvProcessMsg(buf[0], &buf[1], len - 1); > } > else if (len == 0) > break; > else if (len < 0) > { > ereport(LOG, > (errmsg("replication terminated by primary server"), > errdetail("End of WAL reached on timeline %u at %X/%X", > startpointTLI, > (uint32) (LogstreamResult.Write >> 32), > (uint32) LogstreamResult.Write))); > ... > } > len = walrcv_receive(0, &buf); > } > > /* Let the master know that we received some data. */ > XLogWalRcvSendReply(false, false); > > /* > * If we've written some records, flush them to disk and > * let the startup process and primary server know about > * them. > */ > XLogWalRcvFlush(false); > > The loop at the top looks fine - it specifically avoids throwing > an error on EOF. But the code then proceeds to XLogWalRcvSendReply() > which doesn't seem to have the same smarts - it simply does > > if (PQputCopyData(streamConn, buffer, nbytes) <= 0 || > PQflush(streamConn)) > ereport(ERROR, > (errmsg("could not send data to WAL stream: %s", > PQerrorMessage(streamConn)))); > > Unless I'm missing something, that certainly seems to explain > how a standby can lag behind even after a controlled shutdown of > the master. Do you mean to say that as an error has occurred, so it would not be able to flush received WAL, which could result in loss of WAL? I think even if error occurs, it will call flush in WalRcvDie(), before terminating WALReceiver. With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: