Re: Vacuum: allow usage of more than 1GB of work mem

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: Vacuum: allow usage of more than 1GB of work mem
Дата
Msg-id CABOikdNtZJFMr2Cx+oJ=A2gUOPFpEpi++y25+2u0=Nr8r7utig@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Vacuum: allow usage of more than 1GB of work mem  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Vacuum: allow usage of more than 1GB of work mem
Список pgsql-hackers


On Wed, Sep 14, 2016 at 12:21 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 9, 2016 at 3:04 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Attached PoC patch changes the representation of dead tuple locations
> to the hashmap having tuple bitmap.
> The one hashmap entry consists of the block number and the TID bitmap
> of corresponding block, and the block number is the hash key of
> hashmap.
> Current implementation of this patch is not smart yet because each
> hashmap entry allocates the tuple bitmap with fixed
> size(LAZY_ALLOC_TUPLES), so each hashentry can store up to
> LAZY_ALLOC_TUPLES(291 if block size is 8kB) tuples.
> In case where one block can store only the several tens tuples, the
> most bits are would be waste.
>
> After improved this patch as you suggested, I will measure performance benefit.



Now, if I'm reading it correctly, this patch allocates a 132-byte
structure for every page with at least one dead tuple.  In the worst
case where there's just one dead tuple per page, that's a 20x
regression in memory usage.  Actually, it's more like 40x, because
AllocSetAlloc rounds small allocation sizes up to the next-higher
power of two, which really stings for a 132-byte allocation, and then
adds a 16-byte header to each chunk.  But even 20x is clearly not
good.  There are going to be lots of real-world cases where this uses
way more memory to track the same number of dead tuples, and I'm
guessing that benchmarking is going to reveal that it's not faster,
either.


Sawada-san offered to reimplement the patch based on what I proposed upthread. In the new scheme of things, we will allocate a fixed size bitmap of length 2D bits per page where D is average page density of live + dead tuples. (The rational behind multiplying D by a factor of 2 is to consider worst case scenario where every tuple also has a LP_DIRECT line pointer). The value of D in most real world, large tables should not go much beyond, say 100, assuming 80 bytes wide tuple and 8K blocksize. That translates to about 25 bytes/page. So all TIDs with offset less than 2D can be represented by a single bit. We augment this with an overflow to track tuples which fall outside this limit. I believe this area will be small, say 10% of the total allocation.

This representation is at least as good the current representation if there are at least 4-5% dead tuples. I don't think very large tables will be vacuumed with a scale factor less than that. And assuming 10% dead tuples, this representation will actually be much more optimal.

The idea can fail when (a) there are very few dead tuples in the table, say less than 5% or (b) there are large number of tuples falling outside the 2D limit. While I don't expect any of these to hold for real world, very large tables,  (a) we can anticipate when the vacuum starts and use current representation. (b) we can detect at run time and do a one time switch between representation. You may argue that managing two representations is clumsy, which I agree, but the code is completely isolated and probably not more than a few hundred lines.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: condition variables
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Push down more full joins in postgres_fdw