Re: [PoC] Improve dead tuple storage for lazy vacuum

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [PoC] Improve dead tuple storage for lazy vacuum
Дата
Msg-id CAD21AoAK=_E7QhkYVrb9hFiZt3JXbBU=pnyUduR2OYNxr3TBrg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <johncnaylorls@gmail.com>)
Ответы Re: [PoC] Improve dead tuple storage for lazy vacuum  (John Naylor <johncnaylorls@gmail.com>)
Список pgsql-hackers
On Wed, Dec 6, 2023 at 3:39 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Wed, Dec 6, 2023 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Dec 4, 2023 at 5:21 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> > > > Given variable-length value support, RT_GET() would have to do
> > > > repalloc() if the existing value size is not big enough for the new
> > > > value, but it cannot as the radix tree doesn't know the size of each
> > > > stored value.
> > >
> > > I think we have two choices:
> > >
> > > - the value stores the "length". The caller would need to specify a
> > > function to compute size from the "length" member. Note this assumes
> > > there is an array. I think both aspects are not great.
> > > - the value stores the "size". Callers that store an array (as
> > > PageTableEntry's do) would compute length when they need to. This
> > > sounds easier.
> >
> > As for the second idea, do we always need to require the value to have
> > the "size" (e.g. int32) in the first field of its struct? If so, the
> > caller will be able to use only 4 bytes in embedded value cases (or
> > won't be able to use at all if the pointer size is 4 bytes).
>
> We could have an RT_SIZE_TYPE for varlen value types. That's easy.
> There is another way, though: (This is a digression into embedded
> values, but it does illuminate some issues even aside from that)
>
> My thinking a while ago was that an embedded value had no explicit
> length/size, but could be "expanded" into a conventional value for the
> caller. For bitmaps, the smallest full value would have length 1 and
> whatever size (For tid store maybe 16 bytes). This would happen
> automatically via a template function.
>
> Now I think that could be too complicated (especially for page table
> entries, which have more bookkeeping than vacuum needs) and slow.
> Imagine this as an embedded value:
>
> typedef struct BlocktableEntry
> {
>   uint16 size;
>
>   /* later: uint8 flags; for bitmap scan */
>
>   /* 64 bit: 3 elements , 32-bit: 1 element */
>   OffsetNumber offsets[( sizeof(Pointer) - sizeof(int16) ) /
> sizeof(OffsetNumber)];
>
>   /* end of embeddable value */
>
>   bitmapword words[FLEXIBLE_ARRAY_MEMBER];
> } BlocktableEntry;
>
> Here we can use a slot to store up to 3 offsets, no matter how big
> they are. That's great because a bitmap could be mostly wasted space.

Interesting idea.

> But now the caller can't know up front how many bytes it needs until
> it retrieves the value and sees what's already there. If there are
> already three values, the caller needs to tell the tree "alloc this
> much, update this slot you just gave me with the alloc (maybe DSA)
> pointer, and return the local pointer". Then copy the 3 offsets into
> set bits, and set whatever else it needs to.  With normal values, same
> thing, but with realloc.
>
> This is a bit complex, but I see an advantage The tree doesn't need to
> care so much about the size, so the value doesn't need to contain the
> size. For our case, we can use length (number of bitmapwords) without
> the disadvantages I mentioned above, with length zero (or maybe -1)
> meaning "no bitmapword array, the offsets are all in this small
> array".

It's still unclear to me why the value doesn't need to contain the size.

If I understand you correctly, in RT_GET(), the tree allocs a new
memory and updates the slot where the value is embedded with the
pointer to the allocated memory, and returns the pointer to the
caller. Since the returned value, newly allocated memory, is still
empty, the callner needs to copy the contents of the old value to the
new value and do whatever else it needs to.

If the value is already a single-leave value and RT_GET() is called
with a larger size, the slot is always replaced with the newly
allocated area and the caller needs to copy the contents? If the tree
does realloc the value with a new size, how does the tree know the new
value is larger than the existing value? It seems like the caller
needs to provide a function to calculate the size of the value based
on the length.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Test 002_pg_upgrade fails with olddump on Windows
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [PoC] Improve dead tuple storage for lazy vacuum