Re: [PoC] Improve dead tuple storage for lazy vacuum

Поиск

Список

Период

Сортировка

От	John Naylor
Тема	Re: [PoC] Improve dead tuple storage for lazy vacuum
Дата	17 марта 2023 г. 07:02:53
Msg-id	CAFBsxsGiiyY+wykVLBbN9hFUMiNHqEr_Kqg9Mpc=uv4sg8eagQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [PoC] Improve dead tuple storage for lazy vacuum (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы	Re: [PoC] Improve dead tuple storage for lazy vacuum
Список	pgsql-hackers

Дерево обсуждения

On Wed, Mar 15, 2023 at 9:32 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Mar 14, 2023 at 8:27 PM John Naylor
> <john.naylor@enterprisedb.com> wrote:
> >
> > I wrote:
> >
> > > > > Since the block-level measurement is likely overestimating quite a bit, I propose to simply reverse the order of the actions here, effectively reporting progress for the *last page* and not the current one: First update progress with the current memory usage, then add tids for this page. If this allocated a new block, only a small bit of that will be written to. If this block pushes it over the limit, we will detect that up at the top of the loop. It's kind of like our earlier attempts at a "fudge factor", but simpler and less brittle. And, as far as OS pages we have actually written to, I think it'll effectively respect the memory limit, at least in the local mem case. And the numbers will make sense.

> > I still like my idea at the top of the page -- at least for vacuum and m_w_m. It's still not completely clear if it's right but I've got nothing better. It also ignores the work_mem issue, but I've given up anticipating all future cases at the moment.

> IIUC you suggested measuring memory usage by tracking how much memory
> chunks are allocated within a block. If your idea at the top of the
> page follows this method, it still doesn't deal with the point Andres
> mentioned.

Right, but that idea was orthogonal to how we measure memory use, and in fact mentions blocks specifically. The re-ordering was just to make sure that progress reporting didn't show current-use > max-use.

However, the big question remains DSA, since a new segment can be as large as the entire previous set of allocations. It seems it just wasn't designed for things where memory growth is unpredictable.

I'm starting to wonder if we need to give DSA a bit more info at the start. Imagine a "soft" limit given to the DSA area when it is initialized. If the total segment usage exceeds this, it stops doubling and instead new segments get smaller. Modifying an example we used for the fudge-factor idea some time ago:

m_w_m = 1GB, so calculate the soft limit to be 512MB and pass it to the DSA area.

2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> hit soft limit, so "stairstep down" the new segment sizes:

766 + 2*(128) + 64 = 1086MB -> stop

That's just an undeveloped idea, however, so likely v17 development, even assuming it's not a bad idea (could be).

And sadly, unless we find some other, simpler answer soon for tracking and limiting shared memory, the tid store is looking like v17 material.

--
John Naylor
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PoC] Improve dead tuple storage for lazy vacuum