Re: Recent 027_streaming_regress.pl hangs

Поиск
Список
Период
Сортировка
От Alexander Lakhin
Тема Re: Recent 027_streaming_regress.pl hangs
Дата
Msg-id 7e914807-eeaa-5c44-3543-bb2888030122@gmail.com
обсуждение исходный текст
Ответ на Re: Recent 027_streaming_regress.pl hangs  (Andres Freund <andres@anarazel.de>)
Ответы Re: Recent 027_streaming_regress.pl hangs  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hello Andres,

26.03.2024 10:59, Andres Freund wrote:
> Late, will try to look more in the next few days.
>

AFAICS, last 027_streaming_regress.pl failures on calliphoridae,
culicidae, tamandua occurred before 2024-03-27:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2024-03-26%2004%3A07%3A30
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-03-22%2013%3A26%3A21
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-03-24%2007%3A44%3A27

So it looks like the issue resolved, but there is another apparently
performance-related issue: deadlock-parallel test failures.

A recent one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=canebrake&dt=2024-04-02%2022%3A20%3A22
test deadlock-parallel            ... FAILED   345099 ms

+isolationtester: canceling step d2a1 after 300 seconds
  step d2a1: <... completed>
-  sum
------
-10000
-(1 row)
-
...

The server log shows:
2024-04-02 23:56:45.353 UTC [3583878][client backend][5/530:0] LOG: statement: SET force_parallel_mode = on;
...
                   SELECT lock_share(3,x) FROM bigt LIMIT 1;
2024-04-02 23:56:45.364 UTC [3583876][client backend][3/2732:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')
2024-04-02 23:56:45.364 UTC [3583876][client backend][3/2732:0] DETAIL:  parameters: $1 = '3583878'
...
2024-04-02 23:57:28.967 UTC [3583876][client backend][3/5097:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')
2024-04-02 23:57:28.967 UTC [3583876][client backend][3/5097:0] DETAIL:  parameters: $1 = '3583877'
2024-04-02 23:57:29.016 UTC [3583877][client backend][4/530:0] LOG: statement: COMMIT;
2024-04-02 23:57:29.039 UTC [3583876][client backend][3/5098:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')
2024-04-02 23:57:29.039 UTC [3583876][client backend][3/5098:0] DETAIL:  parameters: $1 = '3583879'
...
2024-04-03 00:02:29.096 UTC [3583876][client backend][3/9472:0] LOG:  execute isolationtester_waiting: SELECT 
pg_catalog.pg_isolation_test_session_is_blocked($1, '{3583877,3583878,3583879,3583880}')
2024-04-03 00:02:29.096 UTC [3583876][client backend][3/9472:0] DETAIL:  parameters: $1 = '3583878'
2024-04-03 00:02:29.172 UTC [3905345][not initialized][:0] LOG: connection received: host=[local]
2024-04-03 00:02:29.240 UTC [3583878][client backend][5/530:0] ERROR:  canceling statement due to user request

The last step duration is 00:02:29.096 - 23:57:29.039 ~ 300 seconds
(default max_step_wait for REL_15_STABLE- (for REL_16_STABLE+ the default
value was increased to 360 by c99c67fc4)).

The average deadlock-parallel duration for REL_15_STABLE on canebrake is
around 128 seconds (for 140 runs I analyzed), but we can find also:
test deadlock-parallel            ... ok       377895 ms

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake&dt=2024-03-27%2001%3A06%3A24&stg=isolation-check
test deadlock-parallel            ... ok       302549 ms

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake&dt=2023-11-06%2012%3A47%3A01&stg=isolation-check
test deadlock-parallel            ... ok       255045 ms

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=canebrake&dt=2023-11-09%2010%3A02%3A59&stg=isolation-check

The similar situation on phycodurus:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=phycodurus&dt=2024-02-11%2021:05:41
test deadlock-parallel            ... FAILED   389381 ms

The average deadlock-parallel duration for REL_13_STABLE on phycodurus is
around 78 seconds (for 138 recent runs), but there were also:
test deadlock-parallel            ... ok       441736 ms

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=phycodurus&dt=2024-03-04%2015%3A19%3A04&stg=isolation-check
test deadlock-parallel            ... ok       187844 ms

https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=phycodurus&dt=2023-11-03%2016%3A13%3A46&stg=isolation-check

And also pogona, REL_14_STABLE:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2024-02-20%2003%3A50%3A49
test deadlock-parallel            ... FAILED   425482 ms

(I could reach similar duration on a slowed-down VM, with JIT enabled as
on these animals.)

So, maybe these machines require larger PGISOLATIONTIMEOUT or there is
still some OS/environment issue there?

Best regards,
Alexander



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Psql meta-command conninfo+
Следующее
От: Tom Lane
Дата:
Сообщение: Re: WIP Incremental JSON Parser