Re: Bug in walreceiver
От | Heikki Linnakangas |
---|---|
Тема | Re: Bug in walreceiver |
Дата | |
Msg-id | 4D2EBEE3.2010709@enterprisedb.com обсуждение исходный текст |
Ответ на | Bug in walreceiver (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Bug in walreceiver
|
Список | pgsql-hackers |
On 13.01.2011 10:28, Fujii Masao wrote: > When the master shuts down or crashes, there seems to be > the case where walreceiver exits without flushing WAL which > has already been written. This might lead startup process to > replay un-flushed WAL and break a Write-Ahead-Logging rule. Hmm, that can happen at a crash even with no replication involved. If you "kill -9 postmaster", and some WAL had been written but not fsync'd, on crash recovery we will happily recover the unsynced WAL. We could prevent that by fsyncing all WAL before applying it - presumably fsyncing a file that has already been flushed is quick. But is it worth the trouble? > walreceiver.c >> /* Wait a while for data to arrive */ >> if (walrcv_receive(NAPTIME_PER_CYCLE,&type,&buf,&len)) >> { >> /* Accept the received data, and process it */ >> XLogWalRcvProcessMsg(type, buf, len); >> >> /* Receive any more data we can without sleeping */ >> while (walrcv_receive(0,&type,&buf,&len)) >> XLogWalRcvProcessMsg(type, buf, len); >> >> /* >> * If we've written some records, flush them to disk and let the >> * startup process know about them. >> */ >> XLogWalRcvFlush(); >> } > > The problematic case happens when the latter walrcv_receive > emits ERROR. In this case, the WAL received by the former > walrcv_receive is not guaranteed to have been flushed yet. > > The attached patch ensures that all WAL received is flushed to > disk before walreceiver exits. This patch should be backported > to 9.0, I think. Yeah, we probably should do that, even though it doesn't completely close the window tahat unsynced WAL is replayed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: