Re: [PoC] Improve dead tuple storage for lazy vacuum

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: [PoC] Improve dead tuple storage for lazy vacuum
Дата
Msg-id CANWCAZY46dP5=igG1f3REPq32BqQsKCV19YS-UZy3D309F=qkQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: [PoC] Improve dead tuple storage for lazy vacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Mon, Dec 11, 2023 at 1:18 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> I've attached the updated patch set. From the previous patch set, I've
> merged patches 0007 to 0010. The other changes such as adding RT_GET()
> still are unmerged for now, for discussion. Probably we can make them
> as follow-up patches as we discussed. 0011 to 0015 patches are new
> changes for v44 patch set, which removes RT_SEARCH() and RT_SET() and
> support variable-length values.

This looks like the right direction, and I'm pleased it's not much
additional code on top of my last patch.

v44-0014:

+#ifdef RT_VARLEN_VALUE
+ /* XXX: need to choose block sizes? */
+ tree->leaf_ctx = AllocSetContextCreate(ctx,
+    "radix tree leaves",
+    ALLOCSET_DEFAULT_SIZES);
+#else
+ tree->leaf_ctx = SlabContextCreate(ctx,
+    "radix tree leaves",
+    RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)),
+    sizeof(RT_VALUE_TYPE));
+#endif /* RT_VARLEN_VALUE */

Choosing block size: Similar to what we've discussed previously around
DSA segments, we might model this on CreateWorkExprContext() in
src/backend/executor/execUtils.c. Maybe tid store can pass maint_w_m /
autovac_w_m (later work_mem for bitmap scan). RT_CREATE could set the
max block size to 1/16 of that, or less.

Also, it occurred to me that compile-time embeddable values don't need
a leaf context. I'm not sure how many places assume that there is
always a leaf context. If not many, it may be worth not creating one
here, just to be tidy.

+ size_t copysize;

- memcpy(leaf.local, value_p, sizeof(RT_VALUE_TYPE));
+ copysize = sizeof(RT_VALUE_TYPE);
+#endif
+
+ memcpy(leaf.local, value_p, copysize);

I'm not sure this indirection adds clarity. I guess the intent was to
keep from saying "memcpy" twice, but now the code has to say "copysize
= foo" twice.

For varlen case, we need to watch out for slowness because of memcpy.
Let's put that off for later testing, though. We may someday want to
avoid a memcpy call for the varlen case, so let's keep it flexible
here.

v44-0015:

+#define SizeOfBlocktableEntry (offsetof(

Unused.

+ char buf[MaxBlocktableEntrySize] = {0};

Zeroing this buffer is probably going to be expensive. Also see this
pre-existing comment:
/* WIP: slow, since it writes to memory for every bit */
page->words[wordnum] |= ((bitmapword) 1 << bitnum);

For this function (which will be vacuum-only, so we can assume
ordering), in the loop we can:
* declare the local bitmapword variable to be zero
* set the bits on it
* write it out to the right location when done.

Let's fix both of these at once.

+ if (TidStoreIsShared(ts))
+ shared_rt_set(ts->tree.shared, blkno, (void *) page, page_len);
+ else
+ local_rt_set(ts->tree.local, blkno, (void *) page, page_len);

Is there a reason for "void *"? The declared parameter is
"RT_VALUE_TYPE *value_p" in 0014.
Also, since this function is for vacuum (and other uses will need a
new function), let's assert the returned bool is false.

Does iteration still work? If so, it's not too early to re-wire this
up with vacuum and see how it behaves.

Lastly, my compiler has a warning that CI doesn't have:

In file included from ../src/test/modules/test_radixtree/test_radixtree.c:121:
../src/include/lib/radixtree.h: In function ‘rt_find.isra’:
../src/include/lib/radixtree.h:2142:24: warning: ‘slot’ may be used
uninitialized [-Wmaybe-uninitialized]
 2142 |                 return (RT_VALUE_TYPE*) slot;
      |                        ^~~~~~~~~~~~~~~~~~~~~
../src/include/lib/radixtree.h:2112:23: note: ‘slot’ was declared here
 2112 |         RT_PTR_ALLOC *slot;
      |                       ^~~~



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: [Proposal] Add foreign-server health checks infrastructure
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Improve eviction algorithm in ReorderBuffer