Re: hash agg is slower on wide tables?

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: hash agg is slower on wide tables?
Дата
Msg-id CAFj8pRBUaEmSH9nk+7P_KUNYbMDR1wb7EMg90AGXLkCF5_PmtA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: hash agg is slower on wide tables?  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers


2015-02-22 13:22 GMT+01:00 Andres Freund <andres@2ndquadrant.com>:
On 2015-02-22 10:33:16 +0000, Andrew Gierth wrote:
> This is, if I'm understanding the planner logic right, physical-tlist
> optimization; it's faster for a table scan to simply return the whole
> row (copying nothing, just pointing to the on-disk tuple) and let
> hashagg pick out the columns it needs, rather than for the scan to run a
> projection step just to select specific columns.
>
> If there's a Sort step, this isn't done because Sort neither evaluates
> its input nor projects new tuples on its output, it simply accepts the
> tuples it receives and returns them with the same structure. So now it's
> important to have the node providing input to the Sort projecting out
> only the minimum required set of columns.
>
> Why it's slower on the wider table... that's less obvious.

It's likely to just be tuple deforming. I've not tried it but I'd bet
you'll see slot_deform* very high in the profile. For the narrow table
only two attributes need to be extracted, for the wider one everything
up to a11 will get extracted.

I've wondered before if we shouldn't use the caching via
slot->tts_values so freely - if you only use a couple values from a wide
tuple the current implementation really sucks if those few aren't at the
beginning of the tuple.

the number of columns has strong effect, but it is not only one. I tested first two columns, and bigger tables is aggregated slowly - about 30%

postgres=# explain analyze select count(*), a1, a2 from t1 group by 3,2 order by 3,2;
                                                           QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2023263.19..2023263.25 rows=24 width=4) (actual time=84073.451..84073.452 rows=24 loops=1)
   Sort Key: a2, a1
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=2023262.40..2023262.64 rows=24 width=4) (actual time=84073.430..84073.433 rows=24 loops=1) -- 23700
         Group Key: a2, a1
         ->  Seq Scan on t1  (cost=0.00..1497532.80 rows=70097280 width=4) (actual time=67.325..60152.052 rows=70097280 loops=1)
 Planning time: 0.107 ms
 Execution time: 84073.534 ms
(8 rows)


postgres=# explain analyze select count(*), a1, a2 from t2 group by 3,2 order by 3,2;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1536868.33..1536868.39 rows=24 width=4) (actual time=21963.230..21963.231 rows=24 loops=1)
   Sort Key: a2, a1
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=1536867.54..1536867.78 rows=24 width=4) (actual time=21963.209..21963.213 rows=24 loops=1) -- 16000
         Group Key: a2, a1
         ->  Seq Scan on t2  (cost=0.00..1011137.88 rows=70097288 width=4) (actual time=0.063..5647.404 rows=70097280 loops=1)
 Planning time: 0.069 ms
 Execution time: 21963.340 ms
(8 rows)

Profile when data are in first two columns

   7.87%  postgres                         [.] slot_deform_tuple                    
   7.48%  postgres                         [.] slot_getattr            
   7.10%  postgres                         [.] hash_search_with_hash_value
   3.74%  postgres                         [.] execTuplesMatch  
   3.68%  postgres                         [.] ExecAgg                   

Profile when data are in first and 11 column

  20.35%  postgres                         [.] slot_deform_tuple
   6.55%  postgres                         [.] hash_search_with_hash_value
   5.86%  postgres                         [.] slot_getattr
   4.15%  postgres                         [.] ExecAgg

So your hypothesis is valid

Regards

Pavel

 

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Abbreviated keys for Numeric
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg