Re: Inconsistent DB data in Streaming Replication
От | Florian Pflug |
---|---|
Тема | Re: Inconsistent DB data in Streaming Replication |
Дата | |
Msg-id | CBA631C1-F678-4335-ACCD-1E30846F732C@phlo.org обсуждение исходный текст |
Ответ на | Re: Inconsistent DB data in Streaming Replication (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Inconsistent DB data in Streaming Replication
|
Список | pgsql-hackers |
On Apr14, 2013, at 17:56 , Fujii Masao <masao.fujii@gmail.com> wrote: > At fast shutdown, after walsender sends the checkpoint record and > closes the replication connection, walreceiver can detect the close > of connection before receiving all WAL records. This means that, > even if walsender sends all WAL records, walreceiver cannot always > receive all of them. That sounds like a bug in walreceiver to me. The following code in walreceiver's main loop looks suspicious: /* * Process the received data, and any subsequent data we * can read without blocking. */ for (;;) { if (len > 0) { /* Something was received from master, so reset timeout */ ... XLogWalRcvProcessMsg(buf[0], &buf[1], len- 1); } else if (len == 0) break; else if (len < 0) { ereport(LOG, (errmsg("replication terminatedby primary server"), errdetail("End of WAL reached on timeline %u at %X/%X", startpointTLI, (uint32) (LogstreamResult.Write >> 32), (uint32) LogstreamResult.Write))); ... } len = walrcv_receive(0, &buf); } /* Let the master know that we received some data. */ XLogWalRcvSendReply(false, false); /* * If we've written some records, flush them to disk and * let the startup process and primary server know about *them. */ XLogWalRcvFlush(false); The loop at the top looks fine - it specifically avoids throwing an error on EOF. But the code then proceeds to XLogWalRcvSendReply() which doesn't seem to have the same smarts - it simply does if (PQputCopyData(streamConn, buffer, nbytes) <= 0 || PQflush(streamConn)) ereport(ERROR, (errmsg("couldnot send data to WAL stream: %s", PQerrorMessage(streamConn)))); Unless I'm missing something, that certainly seems to explain how a standby can lag behind even after a controlled shutdown of the master. best regards, Florian Pflug
В списке pgsql-hackers по дате отправления: