Re: Cygwin cleanup
От | Justin Pryzby |
---|---|
Тема | Re: Cygwin cleanup |
Дата | |
Msg-id | 20230112043949.GQ9837@telsasoft.com обсуждение исходный текст |
Ответ на | Re: Cygwin cleanup (Thomas Munro <thomas.munro@gmail.com>) |
Ответы |
Re: Cygwin cleanup
Re: Cygwin cleanup |
Список | pgsql-hackers |
On Sat, Jan 07, 2023 at 12:39:11AM +1300, Thomas Munro wrote: > On Wed, Nov 9, 2022 at 2:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > +data_sync_retry = on > > Sharing with the list some clues that Justin and I figured out about > what that part is doing. Without it, you get failures like: > > PANIC: could not open file "pg_logical/snapshots/0-14FE6B0.snap": > No such file or directory > > That's been seen before: > > https://www.postgresql.org/message-id/flat/17827.1549866683%40sss.pgh.pa.us > > That thread concluded that the operating system must have a non-atomic > rename(), ie a kernel bug. I don't know why Cygwin would display that > behaviour and our native Windows build not; maybe timing, or maybe our > own open() and rename() wrappers for Windows do something important > differently than Cygwin's open() and rename(). > > On reflection, that seems a bit too flimsy to have in-tree without > more investigation, which I won't have time for myself, so I'm going > to withdraw this entry. Not so fast :) Here's my latest copy of the patch. Most recently, rather than setting data_sync_retry=no, I changed to call fsync_fname_ext() rather than fsync_fname(), which uses PANIC (except when data_sync_retry is disabled). That seems to work, showing that the problem is limited to SnapBuildSerialize(), and not a problem with all fsync()... https://cirrus-ci.com/task/5990885733695488 Thomas raised a good question, which was how the tests were passing when SnapBuildSerialize() was raising an error, which is what it would've been doing when I used data_sync_retry=no. So .. why is wal_sync_method being used to control fsync for things other than WAL? See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which point (since 9b178555f, c. 2004) wal_sync_method was already being used for SLOG. Now, it's also being used for logical decoding (since b89e1510 and 858ec1185, c. 2014) in rewriteheap.c/snapbuild.c. And pidfiles (since ee0e525bf, 2010). And the control file (8b938d36f7, 2019). Note that data_sync_retry wasn't added until 9ccdd7f66 (c. 2018) It looks like logical decoding may be the "most wrong" place that wal_sync_method is being used, so maybe my change is reasonable to consider, and not just a workaround. I'm going to re-open the CF entry to let this run for a while to see how it works out. -- Justin
Вложения
В списке pgsql-hackers по дате отправления: