Re: Large Scale Aggregation (HashAgg Enhancement)

Поиск

Список

Период

Сортировка

От	Greg Stark
Тема	Re: Large Scale Aggregation (HashAgg Enhancement)
Дата	17 января 2006 г. 01:01:59
Msg-id	87y81fwiq5.fsf@stark.xeocode.com обсуждение исходный текст
Ответ на	Re: Large Scale Aggregation (HashAgg Enhancement) (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Large Scale Aggregation (HashAgg Enhancement)
Список	pgsql-hackers

Дерево обсуждения

Tom Lane <tgl@sss.pgh.pa.us> writes:

> > Why would we continue to dynamically build the hash table after the
> > start of the outer scan?
> 
> The number of tuples written to a temp file might exceed what we want to
> hold in memory; we won't detect this until the batch is read back in,
> and in that case we have to split the batch at that time.  For hashagg
> this point would apply to the aggregate states not the input tuples, but
> it's still a live problem (especially if the aggregate states aren't
> fixed-size values ... consider a "concat" aggregate for instance).

For a hash aggregate would it be possible to rescan the original table instead
of spilling to temporary files? Then when you run out of working memory you
simply throw out half the hash table and ignore subsequent tuples that fall in
those hash buckets. Then you rescan for the discarded hash bucket regions.

This avoids having to do any disk writes at the expense possibly of additional
reads. I think in terms of i/o it would be much faster in most cases.

The downsides are: a) volatile aggregates or aggregates with side-effects
would be confused by being executed twice. I'm not clear that volatile
aggregate functions make any sense anyways though. b) I'm unclear whether
rescanning the table could potentially find tuples in a different state than
previous scans. If so then the idea doesn't work at all. But I don't think
that's possible is it?

The main problem is c) it may lose in terms of i/o for cases where the
cardinality is low (ie, it's overflowing despite having low cardinality
because the table is really really big too). But most cases will be spilling
because the cardinality is high. So the table may be big but the spill files
are nearly as big anyways and having to write and then read them means double
the i/o.

The upside of not having to write out temporary files is big. I find queries
that require temporary sort files get hit with a *huge* performance penalty.
Often an order of magnitude. Part of that could probably be mitigated by
having the sort files on a separate spindle but I think it's always going to
hurt especially if there are multiple operations spilling to disk
simultaneously.

-- 
greg

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Large Scale Aggregation (HashAgg Enhancement)