Re: archive status ".ready" files may be created too early
От | Alvaro Herrera |
---|---|
Тема | Re: archive status ".ready" files may be created too early |
Дата | |
Msg-id | 202107302222.sp3jgxhd5cbh@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: archive status ".ready" files may be created too early ("Bossart, Nathan" <bossartn@amazon.com>) |
Ответы |
Re: archive status ".ready" files may be created too early
|
Список | pgsql-hackers |
On 2021-Jul-30, Bossart, Nathan wrote: > On 7/30/21, 11:34 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote: > > Hmm ... I'm not sure we're prepared to backpatch this kind of change. > > It seems a bit too disruptive to how replay works. I think patch we > > should be focusing solely on patch 0001 to surgically fix the precise > > bug you see. Does patch 0002 exist because you think that a system with > > only 0001 will not correctly deal with a crash at the right time? > > Yes, that was what I was worried about. However, I just performed a > variety of tests with just 0001 applied, and I am beginning to suspect > my concerns were unfounded. With wal_buffers set very high, > synchronous_commit set to off, and a long sleep at the end of > XLogWrite(), I can reliably cause the archive status files to lag far > behind the current open WAL segment. However, even if I crash at this > time, the .ready files are created when the server restarts (albeit > out of order). This appears to be due to the call to > XLogArchiveCheckDone() in RemoveOldXlogFiles(). Therefore, we can > likely abandon 0002. That's great to hear. I'll give 0001 a look again. > > Now, the reason I'm looking at this patch series is that we're seeing a > > related problem with walsender/walreceiver, which apparently are capable > > of creating a file in the replica that ends up not existing in the > > primary after a crash, for a reason closely related to what you > > describe for WAL archival. I'm not sure what is going on just yet, so > > I'm not going to try and explain because I'm likely to get it wrong. > > I've suspected that this is due to the use of the flushed location for > the send pointer, which AFAICT needn't align with a WAL record > boundary. Yeah, I had gotten as far as the GetFlushRecPtr but haven't tracked down what happens with a contrecord. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
В списке pgsql-hackers по дате отправления: