Re: BUG #14999: pg_rewind corrupts control file global/pg_control
От | Michael Paquier |
---|---|
Тема | Re: BUG #14999: pg_rewind corrupts control file global/pg_control |
Дата | |
Msg-id | 20180115072537.GC1724@paquier.xyz обсуждение исходный текст |
Ответ на | Re: BUG #14999: pg_rewind corrupts control file global/pg_control (TipTop Labs <office@tiptop-labs.com>) |
Ответы |
Re: BUG #14999: pg_rewind corrupts control file global/pg_control
|
Список | pgsql-bugs |
On Fri, Jan 05, 2018 at 04:36:22AM +0100, TipTop Labs wrote: >> On Jan 5, 2018, at 2:26 AM, Michael Paquier <michael.paquier@gmail.com> wrote: >> On Fri, Jan 5, 2018 at 5:06 AM, PG Bug reporting form >> <noreply@postgresql.org> wrote: >>> I have encountered a bug in PostgreSQL 10.1: when the target directory for >>> pg_rewind contains a read-only file (e.g. server.key), pg_rewind exits with >>> "could not open target file" (legitimate) and corrupts the control file >>> global/pg_control to size 0 (bug). From now on, pg_rewind always exits with >>> "unexpected control file size 0, expected 8192" and a restore from >>> pg_basebackup is needed. >> >> Likely that's reproducible down to 9.5 where pg_rewind has been >> introduced. I agree that we should do better with failure handling >> here. Corrupting the control file is not cool. > > I can already confirm that this also occurs with PostgreSQL 9.6. As far as I can see, this happens because of the way the 'remote' mode of pg_rewind considers a set of files to truncate or not. Attached is a patch which does things in a more correct way. The key point here is to make the difference between a relation file and a configuration file when building the filemap for copying a file in full. When a configuration file is not readable, then trying to open it should not be a failure. When using a relation file, there should be failures. There are still two things I am not completely happy about: - pg_control is considered as a configuration file with this patch, however we should fail it pg_control is not readable. In practice I guess that this should not happen, and the patch produces a warning message. I think that we should consider add a special handling for things like pg_control, filenode.map. etc. so as you get a hard failure if those are not readable, so they should enter in the category of FILE_ACTION_COPY_DATA. This needs a bit more thoughts in process_source_file(). - open_target_file() resets manually errno. This is necessary as this gets called continuously when processing the same file in remote mode. I tried as well to make open_target_file and close_target_file one-time operations for each file, but the result was even more ugly, so I went up with this solution. I am adding that to the next CF to not forget about it. This approach is back-patchable down to 9.5 in my opinion. I have added as well a TAP test in the patch which is able to reproduce the problem. Thoughts? -- Michael
Вложения
В списке pgsql-bugs по дате отправления: