Re: bad estimation together with large work_mem generates terrible slow hash joins

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: bad estimation together with large work_mem generates terrible slow hash joins
Дата
Msg-id 54121ACD.2020105@fuzzy.cz
обсуждение исходный текст
Ответ на Re: bad estimation together with large work_mem generates terrible slow hash joins  ("Tomas Vondra" <tv@fuzzy.cz>)
Ответы Re: bad estimation together with large work_mem generates terrible slow hash joins  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 11.9.2014 16:33, Tomas Vondra wrote:
> On 11 Září 2014, 15:31, Robert Haas wrote:
>> On Wed, Sep 10, 2014 at 5:09 PM, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> OK. So here's v13 of the patch, reflecting this change.
>>
>> [...] It does three things:
>>
>> (1) It changes NTUP_PER_BUCKET to 1.  Although this increases memory
>> utilization, it also improves hash table performance, and the
>> additional memory we use is less than what we saved as a result of
>> commit 45f6240a8fa9d35548eb2ef23dba2c11540aa02a.
>>
>> (2) It changes ExecChooseHashTableSize() to include the effect of the
>> memory consumed by the bucket array.  If we care about properly
>> respecting work_mem, this is clearly right for any NTUP_PER_BUCKET
>> setting, but more important for a small setting like 1 than for the
>> current value of 10.  I'm not sure the logic in this function is as
>> clean and simple as it can be just yet.
>> (3) It allows the number of batches to increase on the fly while the
>> hash join is in process.  This case arises when we initially estimate
>> that we only need a small hash table, and then it turns out that there
>> are more tuples than we expect.  Without this code, the hash table's
>> load factor gets too high and things start to suck.
>>
>> I recommend separating this patch in two patches, the first covering
>> items (1) and (2) and the second covering item (3), which really seems
>> like a separate improvement.
>
> That probably makes sense.

Attached is the patch split as suggested:

(a) hashjoin-nbuckets-v14a-size.patch

    * NTUP_PER_BUCKET=1
    * counting buckets towards work_mem
    * changes in ExecChooseHashTableSize (due to the other changes)

(b) hashjoin-nbuckets-v14a-resize.patch

    * rest of the patch, that is ...
    * tracking optimal number of buckets
    * dynamic resize of the hash table
    * adding info the the EXPLAIN ANALYZE output

regards
Tomas

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: B-Tree support function number 3 (strxfrm() optimization)
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Commitfest status