Обсуждение: pgsql: Add TAP tests for timeouts

Поиск
Список
Период
Сортировка

pgsql: Add TAP tests for timeouts

От
Alexander Korotkov
Дата:
Add TAP tests for timeouts

This commit adds new tests to verify that transaction_timeout,
idle_session_timeout, and idle_in_transaction_session_timeout work as expected.
We introduce new injection points in before throwing a timeout FATAL error
and check these injection points are reached.

Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
Author: Andrey Borodin
Reviewed-by: Alexander Korotkov

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/eeefd4280f6e5167d70efabb89586b7d38922d95

Modified Files
--------------
src/backend/tcop/postgres.c                  |  10 +++
src/test/modules/test_misc/Makefile          |   4 +
src/test/modules/test_misc/meson.build       |   4 +
src/test/modules/test_misc/t/005_timeouts.pl | 129 +++++++++++++++++++++++++++
4 files changed, 147 insertions(+)


Re: pgsql: Add TAP tests for timeouts

От
Kyotaro Horiguchi
Дата:
Hello.

At Thu, 14 Mar 2024 11:25:42 +0000, Alexander Korotkov <akorotkov@postgresql.org> wrote in 
> Add TAP tests for timeouts
> 
> This commit adds new tests to verify that transaction_timeout,
> idle_session_timeout, and idle_in_transaction_session_timeout work as expected.
> We introduce new injection points in before throwing a timeout FATAL error
> and check these injection points are reached.
> 
> Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com
> Author: Andrey Borodin
> Reviewed-by: Alexander Korotkov

In 005_timeouts.pl, I found the following comment.

> # If we send \q with $psql_session->quit it can get to pump already closed.
> # So \q is in initial script, here we only finish IPC::Run.
> $psql_session->{run}->finish;

I'm not sure if "it can get to pump already closed." makes sense. I
guess that it means "the command can get to be pumped (or "can be
sent") to the session already closed" or something similar?

> # 2. Test of the sidle in transaction timeout

s/sidle/idle/ ?

> # Wait until the backend is in the timeout injection point.

I'm not sure, but it seems that "is in" meant "passes" or something like?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pgsql: Add TAP tests for timeouts

От
Michael Paquier
Дата:
On Fri, Mar 15, 2024 at 10:42:35AM +0900, Kyotaro Horiguchi wrote:
> In 005_timeouts.pl, I found the following comment.

Note also that the test is not stable, one of my machines with
injection points enabled has complained twice in its last three runs:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2024-03-14%2015%3A05%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2024-03-15%2003%3A21%3A15
--
Michael

Вложения

Re: pgsql: Add TAP tests for timeouts

От
"Andrey M. Borodin"
Дата:
Kyotaro, thanks for your corrections. I agree that wordings should be improved. But let’s deal with failures first.

> On 15 Mar 2024, at 10:28, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Mar 15, 2024 at 10:42:35AM +0900, Kyotaro Horiguchi wrote:
>> In 005_timeouts.pl, I found the following comment.
>
> Note also that the test is not stable, one of my machines with
> injection points enabled has complained twice in its last three runs:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2024-03-14%2015%3A05%3A04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2024-03-15%2003%3A21%3A15

The failure is

t/005_timeouts.pl             (Wstat: 65280 Tests: 0 Failed: 0)
  Non-zero exit status: 255
  Parse errors: No plan found in TAP output
Files=5, Tests=68, 185 wallclock secs ( 0.03 usr  0.00 sys +  0.86 cusr  0.80 csys =  1.69 CPU)
Result: FAIL

The failure seems to be Perl-related.
As far as I can see I’ve done everything akin to 041_checkpoint_at_promote.pl. On batta this test pass, but hachi seems
tobe unhappy with this test. 
And hachi sometimes pass this test too [0].

I’ll look more on this. Do I understand right that we have only 2 buildfarm members with injection points?


Best regards, Andrey Borodin.

[0]
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=hachi&dt=2024-03-14%2022%3A02%3A41&stg=module-test_misc-check


Re: pgsql: Add TAP tests for timeouts

От
"Andrey M. Borodin"
Дата:

> On 15 Mar 2024, at 11:20, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> The failure seems to be Perl-related.

It’s not Perl-related. This is how wait_for_log() times out.

    timed out waiting for match: terminating connection due to transaction timeout at t/005_timeouts.pl line 58.

But it did not even wait for a 100ms…

    2024-03-15 03:32:26.492 UTC [1405044:4] 005_timeouts.pl LOG: statement: SELECT
injection_points_wakeup('transaction-timeout’);
    ///// here we start to wait.
    2024-03-15 03:32:26.492 UTC [1405044:5] 005_timeouts.pl LOG: disconnection: session time: 0:00:00.002 user=admin
database=postgreshost=[local] 
    2024-03-15 03:35:26.623 UTC [1405009:4] LOG: received immediate shutdown request


Best regards, Andrey Borodin.


Re: pgsql: Add TAP tests for timeouts

От
Michael Paquier
Дата:
On Fri, Mar 15, 2024 at 11:20:49AM +0500, Andrey M. Borodin wrote:
> And hachi sometimes pass this test too [0].
>
> I’ll look more on this. Do I understand right that we have only 2 buildfarm members with injection points?

I have added --enable-injection-points to three of them: hachi,
gokiburi and batta.  The switch is also enabled in the CI by defaultm
where you may be able to capture a bit more information than the
buildfarm.
--
Michael

Вложения

Re: pgsql: Add TAP tests for timeouts

От
Alexander Korotkov
Дата:
On Fri, Mar 15, 2024 at 9:44 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Mar 15, 2024 at 11:20:49AM +0500, Andrey M. Borodin wrote:
> > And hachi sometimes pass this test too [0].
> >
> > I’ll look more on this. Do I understand right that we have only 2 buildfarm members with injection points?
>
> I have added --enable-injection-points to three of them: hachi,
> gokiburi and batta.  The switch is also enabled in the CI by defaultm
> where you may be able to capture a bit more information than the
> buildfarm.

I managed to reproduce the issue locally by setting the following extra config via TEMP_CONFIG:

log_line_prefix = '%m [%p:%l] %q%a '
log_connections = 'true'
log_disconnections = 'true'
log_checkpoints = 'true'
log_statement = 'all'
wal_compression = 'on'
debug_parallel_query = regress
wal_compression = lz4
default_toast_compression = lz4

The backtrace is as follows.  We loop in the interrupt checking, given that the injection point handler checks for interrupts internally.  I propose to unset the timeout flag before the injection point (see the attached patch).

* thread #1, name = 'postgres', stop reason = signal SIGSTOP
  * frame #0: 0x00007f3756f289e7 libc.so.6`epoll_wait(epfd=9, events=0x0000556a302cabc8, maxevents=1, timeout=-1) at epoll_wait.c:30:10
    frame #1: 0x0000556a2fc41aab postgres`WaitEventSetWait at latch.c:1570:7
    frame #2: 0x0000556a2fc41e95 postgres`WaitLatch(latch=<unavailable>, wakeEvents=<unavailable>, timeout=<unavailable>, wait_event_info=117440513) at latch.c:538:6
    frame #3: 0x0000556a2fc4f611 postgres`ConditionVariableTimedSleep(cv=0x00007f3757b0e224, timeout=-1, wait_event_info=117440513) at condition_variable.c:163:10
    frame #4: 0x00007f3757b10628 injection_points.so`injection_wait at injection_points.c:153:3
    frame #5: 0x0000556a2fdd4f54 postgres`InjectionPointRun at injection_point.c:313:2
    frame #6: 0x0000556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at postgres.c:3430:4
    frame #7: 0x0000556a2fc4f70b postgres`ConditionVariableTimedSleep(cv=0x00007f3757b0e224, timeout=-1, wait_event_info=117440513) at condition_variable.c:196:3
    frame #8: 0x00007f3757b10628 injection_points.so`injection_wait at injection_points.c:153:3
    frame #9: 0x0000556a2fdd4f54 postgres`InjectionPointRun at injection_point.c:313:2
    frame #10: 0x0000556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at postgres.c:3430:4
    frame #11: 0x0000556a2fc4f70b postgres`ConditionVariableTimedSleep(cv=0x00007f3757b0e224, timeout=-1, wait_event_info=117440513) at condition_variable.c:196:3
    frame #12: 0x00007f3757b10628 injection_points.so`injection_wait at injection_points.c:153:3
    frame #13: 0x0000556a2fdd4f54 postgres`InjectionPointRun at injection_point.c:313:2
    frame #14: 0x0000556a2fc6a8b3 postgres`ProcessInterrupts.part.0 at postgres.c:3430:4
    frame #15: 0x0000556a2fabe63d postgres`ExecGather at nodeGather.c:315:3
    frame #16: 0x0000556a2fabe574 postgres`ExecGather at nodeGather.c:269:10
    frame #17: 0x0000556a2fabe549 postgres`ExecGather(pstate=0x0000556a303d8138) at nodeGather.c:222:9
    frame #18: 0x0000556a2faa21da postgres`standard_ExecutorRun at executor.h:274:9
    frame #19: 0x0000556a2fc6f3cf postgres`PortalRunSelect at pquery.c:924:4
    frame #20: 0x0000556a2fc70b7e postgres`PortalRun at pquery.c:768:18
    frame #21: 0x0000556a2fc6c996 postgres`exec_simple_query at postgres.c:1274:10
    frame #22: 0x0000556a2fc6e378 postgres`PostgresMain at postgres.c:4682:7
    frame #23: 0x0000556a2fbcefa0 postgres`ServerLoop.isra.0 at postmaster.c:4452:2
    frame #24: 0x0000556a2fbcff8b postgres`PostmasterMain at postmaster.c:1478:11
    frame #25: 0x0000556a2f8cb5a7 postgres`main(argc=4, argv=0x0000556a302ca340) at main.c:197:3
    frame #26: 0x00007f3756e28150 libc.so.6`__libc_start_call_main(main=(postgres`main at main.c:59:1), argc=4, argv=0x00007ffffc17e678) at libc_start_call_main.h:58:16
    frame #27: 0x00007f3756e28209 libc.so.6`__libc_start_main_impl(main=(postgres`main at main.c:59:1), argc=4, argv=0x00007ffffc17e678, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007ffffc17e668) at libc-start.c:360:3
    frame #28: 0x0000556a2f8cbb75 postgres`_start + 37

------
Regards,
Alexander Korotkov
Вложения

Re: pgsql: Add TAP tests for timeouts

От
"Andrey M. Borodin"
Дата:

> On 15 Mar 2024, at 15:44, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
>  We loop in the interrupt checking, given that the injection point handler checks for interrupts internally.  I
proposeto unset the timeout flag before the injection point (see the attached patch). 

Oh, cool.
As far as I understand, this is only necessary for the test with injection point.
So, maybe unset it only when injection points are enabled? Something like this is already used in GIN.

#ifdef USE_INJECTION_POINTS
....
#endif


Best regards, Andrey Borodin.


Re: pgsql: Add TAP tests for timeouts

От
Alexander Korotkov
Дата:
On Fri, Mar 15, 2024 at 1:27 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
>
> > On 15 Mar 2024, at 15:44, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> >  We loop in the interrupt checking, given that the injection point handler checks for interrupts internally.  I
proposeto unset the timeout flag before the injection point (see the attached patch). 
>
> Oh, cool.
> As far as I understand, this is only necessary for the test with injection point.
> So, maybe unset it only when injection points are enabled? Something like this is already used in GIN.
>
> #ifdef USE_INJECTION_POINTS
> ....
> #endif


Maybe, but do you see any negative side effects of the unconditionally
unset of flags?  If not, I would prefer to keep the code simple.

------
Regards,
Alexander Korotkov



Re: pgsql: Add TAP tests for timeouts

От
"Andrey M. Borodin"
Дата:

> On 15 Mar 2024, at 16:30, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Maybe, but do you see any negative side effects of the unconditionally
> unset of flags?

Nope, just expressed possible option.

>  If not, I would prefer to keep the code simple.

IMO that's fine. Let's incorporate wording improvement from Kyotaro to the same patch? PFA draft for this.

Maybe it worth to wait a little to see if some other failures will pop up?


Best regards, Andrey Borodin.

Вложения

Re: pgsql: Add TAP tests for timeouts

От
Alexander Korotkov
Дата:
On Fri, Mar 15, 2024 at 2:25 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
> > On 15 Mar 2024, at 16:30, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > Maybe, but do you see any negative side effects of the unconditionally
> > unset of flags?
>
> Nope, just expressed possible option.
>
> >  If not, I would prefer to keep the code simple.
>
> IMO that's fine. Let's incorporate wording improvement from Kyotaro to the same patch? PFA draft for this.

Thank you, I've pushed this, but split into two commits for bugfix and
for improved wordings.

> Maybe it worth to wait a little to see if some other failures will pop up?

I prefer to push now because we're quite confident this fix will
improve the situation.  It should increase the chances of catching
other bugs if any, because buildfarm which was failing on this bug
will go ahead and have a chance to spot more.

------
Regards,
Alexander Korotkov