Re: When to use PARTITION BY HASH?

Поиск
Список
Период
Сортировка
От Michaeldba@sqlexec.com
Тема Re: When to use PARTITION BY HASH?
Дата
Msg-id E230C667-0C0C-4EEF-9131-7D8899453AB6@sqlexec.com
обсуждение исходный текст
Ответ на Re: When to use PARTITION BY HASH?  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-performance
Wow! That is good to know!

Sent from my iPad

> On Jun 7, 2020, at 5:23 PM, David Rowley <dgrowleyml@gmail.com> wrote:
>
>> On Sun, 7 Jun 2020 at 23:41, MichaelDBA <MichaelDBA@sqlexec.com> wrote:
>> The article referenced below assumes a worst case scenario for bulk-loading with hash partitioned tables.  It
assumesthat the values being inserted are in strict ascending or descending order with no gaps (like a sequence number
incrementingby 1), thereby ensuring every partition is hit in order before repeating the process.  If the values being
insertedare not strictly sequential with no gaps, then the performance is much better.  Obviously, what part of the
tablesand indexes are in memory has a lot to do with it as well. 
>
> In PostgreSQL 12, COPY was modified to support bulk-inserts for
> partitioned tables. This did speed up many scenarios.  Internally, how
> this works is that we maintain a series of multi insert buffers, one
> per partition. We generally only flush those buffers to the table when
> the buffer for the partition fills.  However, there is a sort of
> sanity limit [1] on the number of multi insert buffers we maintain at
> once and currently, that is 32.  Technically we could increase that
> limit, but there would still need to be a limit.  Unfortunately, for
> this particular case, since we're most likely touching between 199-799
> other partitions before hitting the first one again, that will mean
> that we really don't get any multi-inserts, which is likely the reason
> why the performance is worse for hash partitioning.
>
> With PG12 and for this particular case, you're likely to see COPY
> performance drop quite drastically when going from 32 to 33
> partitions.  The code was more designed for hitting partitions more
> randomly rather than in this sort-of round-robin way that we're likely
> to get from hash partitioning on a serial column.
>
> David
>
> [1] https://github.com/postgres/postgres/blob/master/src/backend/commands/copy.c#L2569




В списке pgsql-performance по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: When to use PARTITION BY HASH?
Следующее
От: Krzysztof Olszewski
Дата:
Сообщение: Re: Postgresql server gets stuck at low load