Non-systematic handling of EINTR/EAGAIN/EWOULDBLOCK

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Non-systematic handling of EINTR/EAGAIN/EWOULDBLOCK
Дата
Msg-id f9bebfe6-cee4-ed87-d4e6-29b5ca4be08d@gmail.com
обсуждение исходный текст
Список pgsql-hackers
Hello hackers,

Looking at a recent failure on the buildfarm:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2024-04-30%2020%3A48%3A34

# poll_query_until timed out executing this query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
[23:01:41] t/020_archive_status.pl ..............
Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests

with the following error in the log:
2024-04-30 22:57:27.931 CEST [83115:1] LOG:  archive command failed with exit code 1
2024-04-30 22:57:27.931 CEST [83115:2] DETAIL:  The failed archive command was: cp 
"pg_wal/000000010000000000000001_does_not_exist" "000000010000000000000001_does_not_exist"
...
2024-04-30 22:57:28.070 CEST [47962:2] [unknown] LOG:  connection authorized: user=pgbf database=postgres 
application_name=020_archive_status.pl
2024-04-30 22:57:28.072 CEST [47962:3] 020_archive_status.pl LOG: statement: SELECT archived_count FROM
pg_stat_archiver
2024-04-30 22:57:28.073 CEST [83115:3] LOG:  could not send to statistics collector: Resource temporarily unavailable

and the corresponding code (on REL_13_STABLE):
static void
pgstat_send(void *msg, int len)
{
     int         rc;

     if (pgStatSock == PGINVALID_SOCKET)
         return;

     ((PgStat_MsgHdr *) msg)->m_size = len;

     /* We'll retry after EINTR, but ignore all other failures */
     do
     {
         rc = send(pgStatSock, msg, len, 0);
     } while (rc < 0 && errno == EINTR);

#ifdef USE_ASSERT_CHECKING
     /* In debug builds, log send failures ... */
     if (rc < 0)
         elog(LOG, "could not send to statistics collector: %m");
#endif
}

I wonder, whether this retry should be performed after EAGAIN (Resource
temporarily unavailable), EWOULDBLOCK as well.

With a simple send() wrapper (PFA) activated with LD_PRELOAD, I could
reproduce this failure easily when running
`make -s check -C src/test/recovery/ PROVE_TESTS="t/020*"` on
REL_13_STABLE:
t/020_archive_status.pl .. 1/16 # poll_query_until timed out executing this query:
# SELECT archived_count FROM pg_stat_archiver
# expecting this output:
# 1
# last actual query output:
# 0
# with stderr:
# Looks like your test exited with 29 just after 4.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 12/16 subtests

I also reproduced another failure (that lacks useful diagnostics, unfortunately):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=morepork&dt=2022-11-10%2015%3A30%3A16
...
t/020_archive_status.pl .. 8/16 # poll_query_until timed out executing this query:
# SELECT last_archived_wal FROM pg_stat_archiver
# expecting this output:
# 000000010000000000000002
# last actual query output:
# 000000010000000000000001
# with stderr:
# Looks like your test exited with 29 just after 13.
t/020_archive_status.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 3/16 subtests
...

The "n == 64" condition in the cranky send() is needed to aim exactly
these failures. Without this restriction the test (and also `make check`)
just hangs because of:
             if (errno == EINTR)
                 continue;       /* Ok if we were interrupted */

             /*
              * Ok if no data writable without blocking, and the socket is in
              * non-blocking mode.
              */
             if (errno == EAGAIN ||
                 errno == EWOULDBLOCK)
             {
                 return 0;
             }
in internal_flush_buffer().

On the other hand, even with:
int
send(int s, const void *buf, size_t n, int flags)
{
     if (rand() % 10000 == 0)
     {
         errno = EINTR;
         return -1;
     }
     return real_send(s, buf, n, flags);
}

`make check` fails with many miscellaneous errors...

Best regards,
Alexander
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Paul Jungwirth
Дата:
Сообщение: Re: PERIOD foreign key feature
Следующее
От: Bruce Momjian
Дата:
Сообщение: First draft of PG 17 release notes