Re: [PoC] Improve dead tuple storage for lazy vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [PoC] Improve dead tuple storage for lazy vacuum
Дата
Msg-id CAD21AoAWhHSDeNdVEmBpOcVLMav0mz4AJfMN2wURc+h85q7PUQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Ответы Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <john.naylor@enterprisedb.com>)
Список pgsql-hackers
Hi,

On Tue, May 23, 2023 at 7:17 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
>
> I wrote:
> > the current insert/delete paths are quite complex. Using bitmap heap scan as a motivating use case, I hope to
refocuscomplexity to where it's most needed, and aggressively simplify where possible. 
>
> Sometime in the not-too-distant future, I will start a new thread focusing on bitmap heap scan, but for now, I just
wantto share some progress on making the radix tree usable not only for that, but hopefully a wider range of
applications,while making the code simpler and the binary smaller. The attached patches are incomplete (e.g. no
iteration)and quite a bit messy, so tar'd and gzip'd for the curious (should apply on top of v32 0001-03 + 0007-09 ). 
>

Thank you for making progress on this. I agree with these directions
overall. I have some comments and questions:

> - With the latter in mind, searching the child within a node now returns the address of the slot. This allows the
sameinterface whether the slot contains a child pointer or a value. 

Probably we can apply similar changes to the iteration as well.

> * Other nodes will follow in due time, but only after I figure out how to do it nicely (ideas welcome!) -- currently
node32'stwo size classes work fine for growing, but the code should be simplified before extending to other cases.) 

Within the size class, we just alloc a new node of lower size class
and do memcpy(). I guess it will be almost same as what we do for
growing. It might be a good idea to support node shrinking within the
size class for node32 (and node125 if we support). I don't think
shrinking class-3 to class-1 makes sense.

>
> Limited support for "combined pointer-value slots". At compile-time, choose either that or "single-value leaves"
basedon the size of the value type template parameter. Values that are pointer-sized or less can fit in the last-level
childslots of nominal "inner nodes" without duplicated leaf-node code. Node256 now must act like the previous 'node256
leaf',since zero is a valid value. Aside from that, this was a small change. 

Yes, but it also means that we use pointer-sized value anyway even if
the value size is less than that, which wastes the memory, no?

>
> What I've shared here could work (in principal, since it uses uint64 values) for tidstore, possibly faster (untested)
becauseof better code density, but as mentioned I want to shoot for higher. For tidbitmap.c, I want to extend this idea
andbranch at run-time on a per-value basis, so that a page-table entry that fits in a pointer can go there, and if not,
it'llbe a full leaf. (This technique enables more flexibility in lossifying pages as well.) Run-time info will require
e.g.an additional bit per slot. Since the node header is now 3 bytes, we can spare one more byte in the node3 case. In
addition,we can and should also bump it back up to node4, still keeping the metadata within 8 bytes (no struct
padding).

Sounds good.

> I've started in this patchset to refer to the node kinds as "4/16/48/256", regardless of their actual fanout. This is
forreadability (by matching the language in the paper) and maintainability (should *not* ever change again). The size
classes(including multiple classes per kind) could be determined by macros and #ifdef's. For example, in non-SIMD
architectures,it's likely slow to search an array of 32 key chunks, so in that case the compiler should choose size
classessimilar to these four nominal kinds. 

If we want to use the node kinds used in the paper, I think we should
change the number in RT_NODE_KIND_X too. Otherwise, it would be
confusing when reading the code without referring to the paper.
Particularly, this part is very confusing:

        case RT_NODE_KIND_3:
            RT_ADD_CHILD_4(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_32:
            RT_ADD_CHILD_16(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_125:
            RT_ADD_CHILD_48(tree, ref, node, chunk, child);
            break;
        case RT_NODE_KIND_256:
            RT_ADD_CHILD_256(tree, ref, node, chunk, child);
            break;

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: shveta malik
Дата:
Сообщение: Re: Support logical replication of DDLs
Следующее
От: Ranier Vilela
Дата:
Сообщение: Re: Avoid unncessary always true test (src/backend/storage/buffer/bufmgr.c)