Re: BitmapHeapScan streaming read user and prelim refactoring

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: BitmapHeapScan streaming read user and prelim refactoring
Дата
Msg-id cd50927a-8b19-4f0f-884d-609212c71c91@iki.fi
обсуждение исходный текст
Ответ на Re: BitmapHeapScan streaming read user and prelim refactoring  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: BitmapHeapScan streaming read user and prelim refactoring  (Dilip Kumar <dilipbalaut@gmail.com>)
Re: BitmapHeapScan streaming read user and prelim refactoring  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 14/03/2024 06:54, Dilip Kumar wrote:
> On Wed, Mar 13, 2024 at 9:25 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Wed, Mar 13, 2024 at 11:39 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>>> Andres already commented on the snapshot stuff on an earlier patch
>>>> version, and that's much nicer with this version. However, I don't
>>>> understand why a parallel bitmap heap scan needs to do anything at all
>>>> with the snapshot, even before these patches. The parallel worker
>>>> infrastructure already passes the active snapshot from the leader to the
>>>> parallel worker. Why does bitmap heap scan code need to do that too?
>>>
>>> Yeah thinking on this now it seems you are right that the parallel
>>> infrastructure is already passing the active snapshot so why do we
>>> need it again.  Then I checked other low scan nodes like indexscan and
>>> seqscan and it seems we are doing the same things there as well.
>>> Check for SerializeSnapshot() in table_parallelscan_initialize() and
>>> index_parallelscan_initialize() which are being called from
>>> ExecSeqScanInitializeDSM() and ExecIndexScanInitializeDSM()
>>> respectively.
>>
>> I remember thinking about this when I was writing very early parallel
>> query code. It seemed to me that there must be some reason why the
>> EState has a snapshot, as opposed to just using the active snapshot,
>> and so I took care to propagate that snapshot, which is used for the
>> leader's scans, to the worker scans also. Now, if the EState doesn't
>> need to contain a snapshot, then all of that mechanism is unnecessary,
>> but I don't see how it can be right for the leader to do
>> table_beginscan() using estate->es_snapshot and the worker to use the
>> active snapshot.
> 
> Yeah, that's a very valid point. So I think now Heikki/Melanie might
> have got an answer to their question, about the thought process behind
> serializing the snapshot for each scan node.  And the same thing is
> followed for BitmapHeapNode as well.

I see. Thanks, understanding the thought process helps.

So when a parallel table or index scan runs in the executor as part of a 
query, we could just use the active snapshot. But there are some other 
callers of parallel table scans that don't use the executor, namely 
parallel index builds. For those it makes sense to pass the snapshot for 
the scan independent of the active snapshot.

A parallel bitmap heap scan isn't really a parallel scan as far as the 
table AM is concerned, though. It's more like an independent bitmap heap 
scan in each worker process, nodeBitmapHeapscan.c does all the 
coordination of which blocks to scan. So I think that 
table_parallelscan_initialize() was the wrong role model, and we should 
still remove the snapshot serialization code from nodeBitmapHeapscan.c.


Digging deeper into the question of whether es_snapshot == 
GetActiveSnapshot() is a valid assumption:

<deep dive>

es_snapshot is copied from the QueryDesc in standard_ExecutorStart(). 
Looking at the callers of ExecutorStart(), they all get the QueryDesc by 
calling CreateQueryDesc() with GetActiveSnapshot(). And I don't see any 
callers changing the active snapshot between the ExecutorStart() and 
ExecutorRun() calls either. In pquery.c, we explicitly 
PushActiveSnapshot(queryDesc->snapshot) before calling ExecutorRun(). So 
no live bug here AFAICS, es_snapshot == GetActiveSnapshot() holds.

_SPI_execute_plan() has code to deal with the possibility that the 
active snapshot is not set. That seems fishy; do we really support SPI 
without any snapshot? I'm inclined to turn that into an error. I ran the 
regression tests with an "Assert(ActiveSnapshotSet())" there, and 
everything worked.

If es_snapshot was different from the active snapshot, things would get 
weird, even without parallel query. The scans would use es_snapshot for 
the visibility checks, but any functions you execute in quals would use 
the active snapshot.

We could double down on that assumption, and remove es_snapshot 
altogether and use GetActiveSnapshot() instead. And perhaps add 
"PushActiveSnapshot(queryDesc->snapshot)" to ExecutorRun().

</deep dive>

In summary, this es_snapshot stuff is a bit confusing and could use some 
cleanup. But for now, I'd like to just add some assertions and a 
comments about this, and remove the snapshot serialization from bitmap 
heap scan node, to make it consistent with other non-parallel scan nodes 
(it's not really a parallel scan as far as the table AM is concerned). 
See attached patch, which is the same as previous patch with some extra 
assertions.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: POC, WIP: OR-clause support for indexes