Re: Improve eviction algorithm in ReorderBuffer

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Improve eviction algorithm in ReorderBuffer
Дата
Msg-id CAD21AoA-yb-TV8xP_G0Jd+RKXky2mmQvqwCK-209vVJyvCGJ0Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improve eviction algorithm in ReorderBuffer  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Wed, Mar 13, 2024 at 11:23 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Mar 13, 2024 at 12:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Mar 13, 2024 at 10:15 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > On Tue, Mar 12, 2024 at 4:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Fri, Mar 8, 2024 at 12:58 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > > > >
> > > ...
> > > > > > > 5.
> > > > > > > + *
> > > > > > > + * If 'indexed' is true, we create a hash table to track of each node's
> > > > > > > + * index in the heap, enabling to perform some operations such as removing
> > > > > > > + * the node from the heap.
> > > > > > >   */
> > > > > > >  binaryheap *
> > > > > > > -binaryheap_allocate(int capacity, binaryheap_comparator compare, void *arg)
> > > > > > > +binaryheap_allocate(int capacity, binaryheap_comparator compare,
> > > > > > > + bool indexed, void *arg)
> > > > > > >
> > > > > > > BEFORE
> > > > > > > ... enabling to perform some operations such as removing the node from the heap.
> > > > > > >
> > > > > > > SUGGESTION
> > > > > > > ... to help make operations such as removing nodes more efficient.
> > > > > > >
> > > > > >
> > > > > > But these operations literally require the indexed binary heap as we
> > > > > > have an assertion:
> > > > > >
> > > > > > void
> > > > > > binaryheap_remove_node_ptr(binaryheap *heap, bh_node_type d)
> > > > > > {
> > > > > >     bh_nodeidx_entry *ent;
> > > > > >
> > > > > >     Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> > > > > >     Assert(heap->bh_indexed);
> > > > > >
> > > > >
> > > > > I didn’t quite understand -- the operations mentioned are "operations
> > > > > such as removing the node", but binaryheap_remove_node() also removes
> > > > > a node from the heap. So I still felt the comment wording of the patch
> > > > > is not quite correct.
> > > >
> > > > Now I understand your point. That's a valid point.
> > > >
> > > > >
> > > > > Now, if the removal of a node from an indexed heap can *only* be done
> > > > > using binaryheap_remove_node_ptr() then:
> > > > > - the other removal functions (binaryheap_remove_*) probably need some
> > > > > comments to make sure nobody is tempted to call them directly for an
> > > > > indexed heap.
> > > > > - maybe some refactoring and assertions are needed to ensure those
> > > > > *cannot* be called directly for an indexed heap.
> > > > >
> > > >
> > > > If the 'index' is true, the caller can not only use the existing
> > > > functions but also newly added functions such as
> > > > binaryheap_remove_node_ptr() and binaryheap_update_up() etc. How about
> > > > something like below?
> > > >
> > >
> > > You said: "can not only use the existing functions but also..."
> > >
> > > Hmm. Is that right? IIUC those existing "remove" functions should NOT
> > > be called directly if the heap was "indexed" because they'll delete
> > > the node from the heap OK, but any corresponding index for that
> > > deleted node will be left lying around -- i.e. everything gets out of
> > > sync. This was the reason for my original concern.
> > >
> >
> > All existing binaryheap functions should be available even if the
> > binaryheap is 'indexed'. For instance, with the patch,
> > binaryheap_remote_node() is:
> >
> > void
> > binaryheap_remove_node(binaryheap *heap, int n)
> > {
> >     int         cmp;
> >
> >     Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> >     Assert(n >= 0 && n < heap->bh_size);
> >
> >     /* compare last node to the one that is being removed */
> >     cmp = heap->bh_compare(heap->bh_nodes[--heap->bh_size],
> >                            heap->bh_nodes[n],
> >                            heap->bh_arg);
> >
> >     /* remove the last node, placing it in the vacated entry */
> >     replace_node(heap, n, heap->bh_nodes[heap->bh_size]);
> >
> >     /* sift as needed to preserve the heap property */
> >     if (cmp > 0)
> >         sift_up(heap, n);
> >     else if (cmp < 0)
> >         sift_down(heap, n);
> > }
> >
> > The replace_node(), sift_up() and sift_down() update node's index as
> > well if the binaryheap is indexed. When deleting the node from the
> > binaryheap, it will also delete its index from the hash table.
> >
>
> I see now. Thanks for the information.
>
> ~~~
>
> Some more review comments for v8-0002
>
> ======
>
> 1.
> +/*
> + * Remove the node's index from the hash table if the heap is indexed.
> + */
> +static bool
> +delete_nodeidx(binaryheap *heap, bh_node_type node)
> +{
> + if (!binaryheap_indexed(heap))
> + return false;
> +
> + return bh_nodeidx_delete(heap->bh_nodeidx, node);
> +}
>
> I wasn't sure if having this function was a good idea. Yes, it makes
> code more readable, but I felt the heap code ought to be as efficient
> as possible so maybe it is better for the index check to be done at
> the caller, instead of incurring any overhead of function calls that
> might do nothing.
>
> SUGGESTION
> if (binaryheap_indexed(heap))
>   found = bh_nodeidx_delete(heap->bh_nodeidx, node);

I think we can have the function inlined, instead of doing the same
things in multiple places. I've changed it in the v9 patch.

> ~~~
>
> 2.
> +/*
> + * binaryheap_update_up
> + *
> + * Sift the given node up after the node's key is updated. The caller must
> + * ensure that the given node is in the heap. O(log n) worst case.
> + *
> + * This function can be used only if the heap is indexed.
> + */
> +void
> +binaryheap_update_up(binaryheap *heap, bh_node_type d)
> +{
> + bh_nodeidx_entry *ent;
> +
> + Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> + Assert(binaryheap_indexed(heap));
> +
> + ent = bh_nodeidx_lookup(heap->bh_nodeidx, d);
> + Assert(ent);
> + Assert(ent->index >= 0 && ent->index < heap->bh_size);
> +
> + sift_up(heap, ent->index);
> +}
> +
> +/*
> + * binaryheap_update_down
> + *
> + * Sift the given node down after the node's key is updated. The caller must
> + * ensure that the given node is in the heap. O(log n) worst case.
> + *
> + * This function can be used only if the heap is indexed.
> + */
> +void
> +binaryheap_update_down(binaryheap *heap, bh_node_type d)
> +{
> + bh_nodeidx_entry *ent;
> +
> + Assert(!binaryheap_empty(heap) && heap->bh_has_heap_property);
> + Assert(binaryheap_indexed(heap));
> +
> + ent = bh_nodeidx_lookup(heap->bh_nodeidx, d);
> + Assert(ent);
> + Assert(ent->index >= 0 && ent->index < heap->bh_size);
> +
> + sift_down(heap, ent->index);
> +}
>
> Since those functions are almost identical, wouldn't it be better to
> combine them, passing the sift direction?
>
> SUGGESTION
> binaryheap_resift(binaryheap *heap, bh_node_type d, bool sift_dir_up)
> {
>   ...
>
>   if (sift_dir_up)
>     sift_up(heap, ent->index);
>   else
>     sift_down(heap, ent->index);
> }

I'm not really sure binaryheap_resift() is a better API than
binaryheap_update_up() and _down(). Having different APIs for
different behavior makes sense to me. On the other hand, I see your
point that these two functions have duplicated codes, so I created a
common function for them to remove the duplication.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Improve eviction algorithm in ReorderBuffer
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Recent 027_streaming_regress.pl hangs