Re: Parallel query execution introduces performance regressions
От | Andres Freund |
---|---|
Тема | Re: Parallel query execution introduces performance regressions |
Дата | |
Msg-id | 20190401190022.ccugidpkncnckqli@alap3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Parallel query execution introduces performance regressions (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: Parallel query execution introduces performance regressions
Re: Parallel query execution introduces performance regressions Re: Parallel query execution introduces performance regressions |
Список | pgsql-bugs |
Hi, On 2019-04-01 11:52:54 -0700, Peter Geoghegan wrote: > On Mon, Apr 1, 2019 at 11:30 AM Jinho Jung <jinhojun@usc.edu> wrote: > > Surprisingly, we found that even on a larger TPC-C database (scale factor of 50, roughly 4GB of size), parallel scanis still slower than the non-parallel execution plan in the old version. > > That's not a large database, and it's certainly not a large TPC-C > database. If you attempt to stay under the spec's maximum > tpmC/throughput per warehouse, which is 12.86 tpmC per warehouse, then > you'll need several thousand warehouses on modern hardware. We're > talking several hundred gigabytes. Otherwise, as far as the spec is > concerned you're testing an unrealistic workload. There will be > individual customers that make many more purchases than is humanly > possible. You're modelling an app involving hypothetical warehouse > employees that must enter data into their terminals at a rate that is > not humanly possible. I don't think that's really the problem here. It's that there's a fundamental misestimation in the query: > [OLD version] > Nested Loop Semi Join (cost=0.00..90020417940.08 rows=30005835 width=8) > (actual time=0.034..24981.895 rows=90017507 loops=1) > Join Filter: (ref_0.ol_d_id <= ref_1.i_im_id) > -> Seq Scan on order_line ref_0 (cost=0.00..2011503.04 rows=90017504 > width=12) (actual time=0.022..7145.811 rows=90017507 loops=1) > -> Materialize (cost=0.00..2771.00 rows=100000 width=4) (actual > time=0.000..0.000 rows=1 loops=90017507) > -> Seq Scan on item ref_1 (cost=0.00..2271.00 rows=100000 width=4) > (actual time=0.006..0.006 rows=1 loops=1) note the estimated rows=100000 vs the actual rows=1 in the seqscan / materialize. That's what makes the planner think this is much more expensive than it is, which in turn triggers the use of a parallel scan. Greetings, Andres Freund
В списке pgsql-bugs по дате отправления: