Re: Weird test mixup

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Weird test mixup
Дата
Msg-id ZjgsaYs44ici4QMW@paquier.xyz
обсуждение исходный текст
Ответ на Re: Weird test mixup  (Noah Misch <noah@leadboat.com>)
Ответы Re: Weird test mixup  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
On Thu, May 02, 2024 at 12:35:55PM -0700, Noah Misch wrote:
> I should have given a simpler example:
>
> s1: local-attach to POINT
> s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
> s1: exit
> s2: wake up and run POINT as though it had been non-local

Hmm.  Even if you were to emulate that in a controlled manner, you
would need a second injection point that does a wait in s2, which is
something that would happen before injection_callback() and before
scanning the local entry.  This relies on the fact that we're holding
CPU in s2 between the backend shmem hash table lookup and the callback
being called.

>> Fun.  One thing I would ask is why it makes sense to be able to detach
>> a local point from a different session than the one who defined it as
>> local.  Shouldn't the operation of s3 be restricted rather than
>> authorized as a safety measure, instead?
>
> (That's orthogonal to the race condition.)  When s1 would wait at the
> injection point multiple times in one SQL statement, I like issuing the detach
> from s3 so s1 waits at just the first encounter with the injection point.
> This mimics setting a gdb breakpoint and deleting that breakpoint before
> "continue".  The alternative, waking s1 repeatedly until it finishes the SQL
> statement, is less convenient.  (I also patched _detach() to wake the waiter,
> and I plan to propose that.)

Okay.

>> Indeed.  That's a brain fade.  This one could be fixed by collecting
>> the point names when cleaning up the conditions and detach after
>> releasing the spinlock.  This opens a race condition between the
>> moment when the spinlock is released and the detach, where another
>> backend could come in and detach a point before the shmem_exit
>> callback has the time to do its cleanup, even if detach() is
>> restricted for local points.  So we could do the callback cleanup in
>
> That race condition seems fine.  The test can be expected to control the
> timing of backend exits vs. detach calls.  Unlike the InjectionPointRun()
> race, it wouldn't affect backends unrelated to the test.

Sure.  The fact that there are two spinlocks in the backend code and
the module opens concurrency issues.  Making that more robust if there
is a case for it is OK by me, but I'd rather avoid making the
backend-side more complicated than need be.

>> three steps in the shmem exit callback:
>> - Collect the names of the points to detach, while holding the
>> spinlock.
>> - Do the Detach.
>> - Take again the spinlock, clean up the conditions.
>>
>> Please see the attached.
>
> The injection_points_cleanup() parts look good.  Thanks.

Thanks for the check.

>> @@ -403,6 +430,10 @@ injection_points_detach(PG_FUNCTION_ARGS)
>>  {
>>      char       *name = text_to_cstring(PG_GETARG_TEXT_PP(0));
>>
>> +    if (!injection_point_allowed(name))
>> +        elog(ERROR, "cannot detach injection point \"%s\" not allowed to run",
>> +             name);
>> +
>
> As above, I disagree with the injection_points_detach() part.

Okay, noted.  Fine by me to expand this stuff as you feel, the code
has been written to be extended depending on what people want to
support.  There should be tests in the tree that rely on any
new behavior, though.

I've applied the patch to fix the spinlock logic in the exit callback
for now.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michail Nikolaev
Дата:
Сообщение: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: 039_end_of_wal: error in "xl_tot_len zero" test