Обсуждение: FYI: 2022-10 thorntail failures from coreutils FICLONE

Поиск
Список
Период
Сортировка

FYI: 2022-10 thorntail failures from coreutils FICLONE

От
Noah Misch
Дата:
thorntail failed some recovery tests in 2022-10:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-11-02%2004%3A25%3A43
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-31%2013%3A32%3A42
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-29%2017%3A48%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-24%2013%3A48%3A16
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-24%2010%3A08%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-21%2000%3A58%3A14
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-16%2000%3A08%3A17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-15%2020%3A48%3A18
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-14%2020%3A13%3A35
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2022-10-14%2006%3A58%3A15

thorntail has long seen fsync failures, due to a driver bug[1].  On
2022-09-28, its OS updated coreutils from 8.32-4.1, 9.1-1.  That brought in
"cp" use of the FICLONE ioctl.  FICLONE internally syncs its source file,
reporting EIO if that fails.  A bug[2] in "cp" allowed it to silently make a
defective copy instead of reporting that EIO.  Since the recovery suite
archive_command uses "cp", these test failures emerged.  The kernel may
change[3] to make such userspace bugs harder to add.

For thorntail, my workaround was to replace "cp" with a wrapper doing 'exec
/usr/bin/cp --reflink=never "$@"'.  I might eventually propose the ability to
disable FICLONE calls in PostgreSQL code.  So far, those calls (in pg_upgrade)
have not caused thorntail failures.

[1] https://postgr.es/m/flat/20210508001418.GA3076445@rfd.leadboat.com
[2] https://github.com/coreutils/coreutils/commit/f6c93f334ef5dbc5c68c299785565ec7b9ba5180
[3] https://lore.kernel.org/linux-xfs/20221108172436.GA3613139@rfd.leadboat.com



Re: FYI: 2022-10 thorntail failures from coreutils FICLONE

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> thorntail failed some recovery tests in 2022-10:

Speaking of which ... thorntail hasn't reported in for nearly
three weeks.  Is it stuck?

            regards, tom lane



Re: FYI: 2022-10 thorntail failures from coreutils FICLONE

От
Noah Misch
Дата:
On Mon, Jan 09, 2023 at 10:49:26PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > thorntail failed some recovery tests in 2022-10:
> 
> Speaking of which ... thorntail hasn't reported in for nearly
> three weeks.  Is it stuck?

Its machine has been unresponsive to ssh for those weeks.