Re: BUG #14999: pg_rewind corrupts control file global/pg_control
От | Tom Lane |
---|---|
Тема | Re: BUG #14999: pg_rewind corrupts control file global/pg_control |
Дата | |
Msg-id | 30255.1522711675@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: BUG #14999: pg_rewind corrupts control file global/pg_control (Dmitry Dolgov <9erthalion6@gmail.com>) |
Ответы |
Re: BUG #14999: pg_rewind corrupts control file global/pg_control
|
Список | pgsql-bugs |
Dmitry Dolgov <9erthalion6@gmail.com> writes: > Ok, I agree. Then yes, this patch can be probably marked as ready. I started to look over this patch, and soon decided that the commentary in pg_rewind is impossibly bad :-(. You need to study libpq_executeFileMap for a long time before you are even sure which side of the transfer it's executing on; the fact that it's doing a copy *TO* the remote server is incredibly misleading, and the comments are not anywhere near good enough to de-confuse somebody coming upon the code for the first time. Not to mention that the function's name conveys nothing whatever. Maybe it's not the job of this patch to fix that, but I can't help thinking that poor design and documentation are a big part of why this bug exists in the first place. But enough of that rant. Now that I've more or less wrapped my head around what's happening, I think this is the core of the problem: libpq_executeFileMap truncates every non-data file in the local data directory before it's fetched anything at all from the remote server. This seems to me to be utterly cavalier and brittle, because ANY FAILURE AT ALL in the fetch sequence leaves you with a data directory that's been trashed in whole or in part --- and that sequence could span a good long time, if there's a lot to copy. The originally complained-of problem (that pg_control gets nuked) is just the tip of that iceberg, and I'm afraid that the proposed patch is only removing a single potential failure mode. I think that a saner design would involve not truncating any particular target file until we first get some data to write to it. Christian's original patch shown at https://www.postgresql.org/message-id/CAB7nPqTyq3cjphJmoCPc0n8r90O4j1BpZC2BMaj8dKZ7g0-mtw%40mail.gmail.com seems to be headed in that direction, in that it forces ordering of the data fetch (which is likely a good idea anyway to ensure locality of reference) and then performs the truncation when we get the first chunk of data for a file. Michael's proposal to have separate error handling for data and config files is not bad in itself, and it does improve the behavior for one foreseeable trouble case. But I think we need the other thing as well, or some variant of it, before we can regard pg_rewind as anything except a chainsaw with the safety catch missing. regards, tom lane
В списке pgsql-bugs по дате отправления: