Re: Vacuum: allow usage of more than 1GB of work mem

Поиск

Список

Период

Сортировка

От	Pavan Deolasee
Тема	Re: Vacuum: allow usage of more than 1GB of work mem
Дата	14 сентября 2016 г. 06:17:52
Msg-id	CABOikdNtZJFMr2Cx+oJ=A2gUOPFpEpi++y25+2u0=Nr8r7utig@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Vacuum: allow usage of more than 1GB of work mem (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Vacuum: allow usage of more than 1GB of work mem
Список	pgsql-hackers

Дерево обсуждения

On Wed, Sep 14, 2016 at 12:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 9, 2016 at 3:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Attached PoC patch changes the representation of dead tuple locations
> to the hashmap having tuple bitmap.
> The one hashmap entry consists of the block number and the TID bitmap
> of corresponding block, and the block number is the hash key of
> hashmap.
> Current implementation of this patch is not smart yet because each
> hashmap entry allocates the tuple bitmap with fixed
> size(LAZY_ALLOC_TUPLES), so each hashentry can store up to
> LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples.
> In case where one block can store only the several tens tuples, the
> most bits are would be waste.
>
> After improved this patch as you suggested, I will measure performance benefit.

Now, if I'm reading it correctly, this patch allocates a 132-byte
structure for every page with at least one dead tuple. In the worst
case where there's just one dead tuple per page, that's a 20x
regression in memory usage. Actually, it's more like 40x, because
AllocSetAlloc rounds small allocation sizes up to the next-higher
power of two, which really stings for a 132-byte allocation, and then
adds a 16-byte header to each chunk. But even 20x is clearly not
good. There are going to be lots of real-world cases where this uses
way more memory to track the same number of dead tuples, and I'm
guessing that benchmarking is going to reveal that it's not faster,
either.

Sawada-san offered to reimplement the patch based on what I proposed upthread. In the new scheme of things, we will allocate a fixed size bitmap of length 2D bits per page where D is average page density of live + dead tuples. (The rational behind multiplying D by a factor of 2 is to consider worst case scenario where every tuple also has a LP_DIRECT line pointer). The value of D in most real world, large tables should not go much beyond, say 100, assuming 80 bytes wide tuple and 8K blocksize. That translates to about 25 bytes/page. So all TIDs with offset less than 2D can be represented by a single bit. We augment this with an overflow to track tuples which fall outside this limit. I believe this area will be small, say 10% of the total allocation.

This representation is at least as good the current representation if there are at least 4-5% dead tuples. I don't think very large tables will be vacuumed with a scale factor less than that. And assuming 10% dead tuples, this representation will actually be much more optimal.

The idea can fail when (a) there are very few dead tuples in the table, say less than 5% or (b) there are large number of tuples falling outside the 2D limit. While I don't expect any of these to hold for real world, very large tables, (a) we can anticipate when the vacuum starts and use current representation. (b) we can detect at run time and do a one time switch between representation. You may argue that managing two representations is clumsy, which I agree, but the code is completely isolated and probably not more than a few hundred lines.

Thanks,

Pavan

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Geoghegan
Дата: 14 сентября 2016 г., 05:55:53
Сообщение: Re: condition variables

Следующее

От: Ashutosh Bapat
Дата: 14 сентября 2016 г., 06:38:15
Сообщение: Re: Push down more full joins in postgres_fdw

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Vacuum: allow usage of more than 1GB of work mem

Предыдущее

Следующее