Re: DBT-3 with SF=20 got failed

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: DBT-3 with SF=20 got failed
Дата
Msg-id 55D53673.7000106@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: DBT-3 with SF=20 got failed  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
Список pgsql-hackers
Hi,

On 08/19/2015 01:55 PM, Kohei KaiGai wrote:
>   Merge Join  (cost=25374644.08..1160509591.61 rows=60521928028
> width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1)
>     Merge Cond: (ws1.ws_order_number = ws2.ws_order_number)
>     Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
>     Rows Removed by Join Filter: 127853313
>     ->  Sort  (cost=12687322.04..12867325.16 rows=72001248 width=16)
> (actual time=73252.300..79017.420 rows=72001237 loops=1)
>           Sort Key: ws1.ws_order_number
>           Sort Method: quicksort  Memory: 7083296kB
>           ->  Seq Scan on web_sales ws1  (cost=0.00..3290612.48
> rows=72001248 width=16) (actual time=0.023..39951.201 rows=72001237
> loops=1)
>     ->  Sort  (cost=12687322.04..12867325.16 rows=72001248 width=16)
> (actual time=65095.655..128885.811 rows=904010978 loops=1)
>           Sort Key: ws2.ws_order_number
>           Sort Method: quicksort  Memory: 7083296kB
>           ->  Seq Scan on web_sales ws2  (cost=0.00..3290612.48
> rows=72001248 width=16) (actual time=0.014..31046.888 rows=72001237
> loops=1)
>   Planning time: 0.232 ms
>   Execution time: 530176.521 ms
> (14 rows)
>
>
> So, even if we allows nodeHash.c to allocate hash buckets larger than
> 1GB, its initial size may be determined carefully.
> Probably, 1GB is a good starting point even if expanded later.

I'm not sure I understand what is the problem here? Could you elaborate?

The initial size of the hash table is determined using the estimate, and 
if we overestimate it will create more buckets (i.e. consuming more 
memory) and/or start batching (which might be unnecessary).

But I don't really see any "more careful" way to do this, without 
penalizing the cases where the estimate is actually correct - e.g. by 
starting with much smaller buckets (and then resizing the hash table, 
which is not free). Or by starting without batching, betting that we 
won't actually need it.

I think it'll be very difficult to get those working without causing 
real trouble to cases where we actually do have good estimates (and 
those are vast majority of queries).

But both of those are features, and we're dealing with a bug fix here.


kind regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: PATCH: use foreign keys to improve join estimates v1
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: DBT-3 with SF=20 got failed