Re: BUG #15324: Non-deterministic behaviour from parallelisedsub-query
От | Stephen Frost |
---|---|
Тема | Re: BUG #15324: Non-deterministic behaviour from parallelisedsub-query |
Дата | |
Msg-id | 20180815111006.GB3326@tamriel.snowman.net обсуждение исходный текст |
Ответ на | Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: BUG #15324: Non-deterministic behaviour from parallelised sub-query
|
Список | pgsql-bugs |
Greetings, * Amit Kapila (amit.kapila16@gmail.com) wrote: > On Tue, Aug 14, 2018 at 9:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Marko Tiikkaja <marko@joh.to> writes: > >> Marking the function parallel safe doesn't seem wrong to me. The > >> non-parallel-safe part is that the input gets fed to it in different order > >> in different workers. And I don't really think that to be the function's > >> fault. > > > > So that basically opens the question of whether *any* window function > > calculation can safely be pushed down to parallel workers. > > I think we can consider it as a parallel-restricted operation. For > the purpose of testing, I have marked row_number as > parallel-restricted in pg_proc and I get the below plan: > > postgres=# Explain select count(*) from qwr where (a, b) in (select a, > row_number() over() from qwr); > QUERY PLAN > -------------------------------------------------------------------------------------------------------- > Aggregate (cost=46522.12..46522.13 rows=1 width=8) > -> Hash Semi Join (cost=24352.08..46362.12 rows=64001 width=0) > Hash Cond: ((qwr.a = qwr_1.a) AND (qwr.b = (row_number() OVER (?)))) > -> Gather (cost=0.00..18926.01 rows=128002 width=8) > Workers Planned: 2 > -> Parallel Seq Scan on qwr (cost=0.00..18926.01 > rows=64001 width=8) > -> Hash (cost=21806.06..21806.06 rows=128002 width=12) > -> WindowAgg (cost=0.00..20526.04 rows=128002 width=12) > -> Gather (cost=0.00..18926.01 rows=128002 width=4) > Workers Planned: 2 > -> Parallel Seq Scan on qwr qwr_1 > (cost=0.00..18926.01 rows=64001 width=4) > (11 rows) > > This seems okay, though the results of the above parallel-execution > are not same as serial-execution. I think the reason for it is that > we don't get rows in predictable order from workers. You wouldn't get them in a predictable order even without parallelization due to the lack of an ordering, so this hardly seems like an issue. > > Somewhat like the LIMIT/OFFSET case, it seems to me that we could only > > expect to do this safely if the row ordering induced by the WINDOW clause > > can be proven to be fully deterministic. The planner has no such smarts > > at the moment AFAIR. In principle you could do it if there were > > partitioning/ordering by a primary key, but I'm not excited about the > > prospects of that being true often enough in practice to justify making > > the check. > > Yeah, I am also not sure if it is worth adding the additional checks. > So, for now, we can treat any window function calculation as > parallel-restricted and if later anybody has a reason strong enough to > relax the restriction for some particular case, we will consider it. Seems likely that we'll want this at some point, but certainly seems like new work and not a small bit of it. Thanks! Stephen
Вложения
В списке pgsql-bugs по дате отправления: