Re: [PoC] Improve dead tuple storage for lazy vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [PoC] Improve dead tuple storage for lazy vacuum
Дата
Msg-id CAD21AoDD5VXYMyWdrSjtN01HdYq3Ja-yzTP=rD9315-5bdLzpA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Ответы Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Список pgsql-hackers
On Wed, Oct 5, 2022 at 6:40 PM John Naylor <john.naylor@enterprisedb.com> wrote:
>
>
> On Wed, Oct 5, 2022 at 1:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Sep 28, 2022 at 12:49 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Sep 23, 2022 at 12:11 AM John Naylor
> > > <john.naylor@enterprisedb.com> wrote:
> > > Yeah, node31 and node256 are bloated.  We probably could use slab for
> > > node256 independently. It's worth trying a benchmark to see how it
> > > affects the performance and the tree size.
>
> This wasn't the focus of your current email, but while experimenting with v6 I had another thought about local
allocation:If we use the default slab block size of 8192 bytes, then only 3 chunks of size 2088 can fit, right? If so,
sinceaset and DSA also waste at least a few hundred bytes, we could store a useless 256-byte slot array within node256.
Thatway, node128 and node256 share the same start of pointers/values array, so there would be one less branch for
gettingthat address. In v6, rt_node_get_values and rt_node_get_children are not inlined (asde: gcc uses a jump table
for5 kinds but not for 4), but possibly should be, and the smaller the better. 

It would be good for performance but I'm a bit concerned that it's
highly optimized to the design of aset and DSA. Since size 2088 will
be currently classed as 2616 in DSA, DSA wastes 528 bytes. However, if
we introduce a new class of 2304 (=2048 + 256) bytes we cannot store a
useless 256-byte and the assumption will be broken.

>
> > Regarding DSA support, IIUC we need to use dsa_pointer in inner nodes
> > to point to its child nodes, instead of C pointers (ig, backend-local
> > address). I'm thinking of a straightforward approach as the first
> > step; inner nodes have a union of rt_node* and dsa_pointer and we
> > choose either one based on whether the radix tree is shared or not. We
> > allocate and free the shared memory for individual nodes by
> > dsa_allocate() and dsa_free(), respectively. Therefore we need to get
> > a C pointer from dsa_pointer by using dsa_get_address() while
> > descending the tree. I'm a bit concerned that calling
> > dsa_get_address() for every descent could be performance overhead but
> > I'm going to measure it anyway.
>
> Are dsa pointers aligned the same as pointers to locally allocated memory? Meaning, is the offset portion always a
multipleof 4 (or 8)? 

I think so.

> It seems that way from a glance, but I can't say for sure. If the lower 2 bits of a DSA pointer are never set, we can
tagthem the same way as a regular pointer. That same technique could help hide the latency of converting the pointer,
bythe same way it would hide the latency of loading parts of a node into CPU registers. 
>
> One concern is, handling both local and dsa cases in the same code requires more (predictable) branches and reduces
codedensity. That might be a reason in favor of templating to handle each case in its own translation unit. 

Right. We also need to support locking for shared radix tree, which
would require more branches.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: shadow variables - pg15 edition
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication