Re: Recent 027_streaming_regress.pl hangs

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Recent 027_streaming_regress.pl hangs
Дата
Msg-id 20240321025024.ohozgkijorpp3ejx@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Recent 027_streaming_regress.pl hangs  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

On 2024-03-20 17:41:47 -0700, Andres Freund wrote:
> There's a lot of other animals on the same machine, however it's rarely fuly
> loaded (with either CPU or IO).
>
> I don't think the test just being slow is the issue here, e.g. in the last
> failing iteration
>
> [...]
>
> I suspect we have some more fundamental instability at our hands, there have
> been failures like this going back a while, and on various machines.

I'm somewhat confused by the timestamps in the log:

[22:07:50.263](223.929s) ok 2 - regression tests pass
...
[22:14:02.051](371.788s) # poll_query_until timed out executing this query:

I read this as 371.788s having passed between the messages. Which of course is
much higher than PostgreSQL::Test::Utils::timeout_default=180

Ah.

The way that poll_query_until() implements timeouts seems decidedly
suboptimal. If a psql invocation, including query processing, takes any
appreciateble amount of time, poll_query_until() waits much longer than it
shoulds, because it very naively determines a number of waits ahead of time:

    my $max_attempts = 10 * $PostgreSQL::Test::Utils::timeout_default;
    my $attempts = 0;

    while ($attempts < $max_attempts)
    {
...

        # Wait 0.1 second before retrying.
        usleep(100_000);

        $attempts++;
    }

Ick.

What's worse is that if the query takes too long, the timeout afaict never
takes effect.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Have pg_basebackup write "dbname" in "primary_conninfo"?
Следующее
От: Noah Misch
Дата:
Сообщение: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel