Обсуждение: client-side fsync() error handling
Continuing the discussion from [0], there are still a number of fsync() calls in client programs that are unchecked or where errors are treated non-fatally. Digging around through the call stack, I think changing fsync_fname() to just call exit(1) on errors instead of proceeding would address most of that. This affects (at least) initdb, pg_basebackup, pg_checksums, pg_dump, pg_dumpall, and pg_rewind. Thoughts? [0]: https://www.postgresql.org/message-id/flat/9b49fe44-8f3e-eca9-5914-29e9e99030bf%402ndquadrant.com -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On Tue, Feb 11, 2020 at 09:22:54AM +0100, Peter Eisentraut wrote: > Digging around through the call stack, I think changing fsync_fname() to > just call exit(1) on errors instead of proceeding would address most of > that. > > Thoughts? Doing things as you do in your patch sounds fine to me for this part. Now, don't we need to care about durable_rename() and make the panic-like failure an optional choice? From what I can see, this routine is used now in the backend for pg_basebackup to rename temporary history files or partial WAL segments. -- Michael
Вложения
On 2020-02-12 06:28, Michael Paquier wrote: > Now, don't we need to care about durable_rename() and make the > panic-like failure an optional choice? From what I can see, this > routine is used now in the backend for pg_basebackup to rename > temporary history files or partial WAL segments. durable_rename() calls fsync_fname(), so it would be covered by this change. The other file access calls in there can be handled by normal error handling, I think. Is there any specific scenario you have in mind? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 13, 2020 at 10:02:31AM +0100, Peter Eisentraut wrote: > On 2020-02-12 06:28, Michael Paquier wrote: >> Now, don't we need to care about durable_rename() and make the >> panic-like failure an optional choice? From what I can see, this >> routine is used now in the backend for pg_basebackup to rename >> temporary history files or partial WAL segments. > > durable_rename() calls fsync_fname(), so it would be covered by this change. > The other file access calls in there can be handled by normal error > handling, I think. Is there any specific scenario you have in mind? The old file flush is handled by your patch, but not the new one if it exists, and it seems to me that we should handle failures consistently to reason easier about it, actually as the top of the function says :) Another point that we could consider is if fsync_fname() should have an option to not trigger an immediate exit when facing a failure. The backend has that option thanks to fsync_fname_ext() with its elevel argument. Your choice to default to a failure is fine for most cases because that's what we want. However, I am questioning if this change would be surprising for some client applications or not, and if we should have the option to choose one behavior or the other. -- Michael
Вложения
On 2020-02-13 12:52, Michael Paquier wrote: >> durable_rename() calls fsync_fname(), so it would be covered by this change. >> The other file access calls in there can be handled by normal error >> handling, I think. Is there any specific scenario you have in mind? > > The old file flush is handled by your patch, but not the new one if > it exists, and it seems to me that we should handle failures > consistently to reason easier about it, actually as the top of the > function says :) OK, added in new patch. > Another point that we could consider is if fsync_fname() should have > an option to not trigger an immediate exit when facing a failure. The > backend has that option thanks to fsync_fname_ext() with its elevel > argument. Your choice to default to a failure is fine for most cases > because that's what we want. However, I am questioning if this change > would be surprising for some client applications or not, and if we > should have the option to choose one behavior or the other. The option in the backend is between panicking and retrying. The old behavior was to always retry but we have learned that that usually doesn't work. The frontends do neither right now, or at least the error handling is very inconsistent and inscrutable. It would be possible in theory to add a retry option, but that would be a very different patch, and given what we have learned about fsync(), it probably wouldn't be widely useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
On Thu, Feb 20, 2020 at 10:10:11AM +0100, Peter Eisentraut wrote: > OK, added in new patch. Thanks, that looks good. > The frontends do neither right now, or at least the error handling is very > inconsistent and inscrutable. It would be possible in theory to add a retry > option, but that would be a very different patch, and given what we have > learned about fsync(), it probably wouldn't be widely useful. Perhaps. Let's have this discussion later if there are drawbacks about changing things the way your patch does. If we don't do that, we'll never know about it either and this patch makes things safer. -- Michael
Вложения
On 2020-02-21 05:18, Michael Paquier wrote: > On Thu, Feb 20, 2020 at 10:10:11AM +0100, Peter Eisentraut wrote: >> OK, added in new patch. > > Thanks, that looks good. committed >> The frontends do neither right now, or at least the error handling is very >> inconsistent and inscrutable. It would be possible in theory to add a retry >> option, but that would be a very different patch, and given what we have >> learned about fsync(), it probably wouldn't be widely useful. > > Perhaps. Let's have this discussion later if there are drawbacks > about changing things the way your patch does. If we don't do that, > we'll never know about it either and this patch makes things safer. yup -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Feb 24, 2020 at 05:03:07PM +0100, Peter Eisentraut wrote: > committed Thanks! -- Michael