Re: slab allocator performance issues
От | John Naylor |
---|---|
Тема | Re: slab allocator performance issues |
Дата | |
Msg-id | CAFBsxsGBgHto-XHV2OZLCJ5v6-PswKrJHr0maP8Kf2+dftxYDQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: slab allocator performance issues (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: slab allocator performance issues
|
Список | pgsql-hackers |
On Mon, Dec 5, 2022 at 3:02 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 11 Nov 2022 at 22:20, John Naylor <john.naylor@enterprisedb.com> wrote:
> > #define SLAB_FREELIST_COUNT ((1<<3) + 1)
> > index = (freecount & (SLAB_FREELIST_COUNT - 2)) + (freecount != 0);
>
> Doesn't this create a sort of round-robin use of the free list? What
> we want is a sort of "histogram" bucket set of free lists so we can
> group together blocks that have a close-enough free number of chunks.
> Unless I'm mistaken, I think what you have doesn't do that.
The intent must have slipped my mind along the way.
> I wondered if simply:
>
> index = -(-freecount >> slab->freelist_shift);
>
> would be faster than Andres' version. I tried it out and on my AMD
> machine, it's about the same speed. Same on a Raspberry Pi 4.
>
> Going by [2], the instructions are very different with each method, so
> other machines with different latencies on those instructions might
> show something different. I attached what I used to test if anyone
> else wants a go.
I get about 0.1% difference on my machine. Both ways boil down to (on gcc) 3 instructions with low latency. The later ones need the prior results to execute, which I think is what the XXX comment "isn't great" was referring to. The new coding is more mysterious (does it do the right thing on all platforms?), so I guess the original is still the way to go unless we get a better idea.
--
John Naylor
EDB: http://www.enterprisedb.com
> > index = (freecount & (SLAB_FREELIST_COUNT - 2)) + (freecount != 0);
>
> Doesn't this create a sort of round-robin use of the free list? What
> we want is a sort of "histogram" bucket set of free lists so we can
> group together blocks that have a close-enough free number of chunks.
> Unless I'm mistaken, I think what you have doesn't do that.
The intent must have slipped my mind along the way.
> I wondered if simply:
>
> index = -(-freecount >> slab->freelist_shift);
>
> would be faster than Andres' version. I tried it out and on my AMD
> machine, it's about the same speed. Same on a Raspberry Pi 4.
>
> Going by [2], the instructions are very different with each method, so
> other machines with different latencies on those instructions might
> show something different. I attached what I used to test if anyone
> else wants a go.
I get about 0.1% difference on my machine. Both ways boil down to (on gcc) 3 instructions with low latency. The later ones need the prior results to execute, which I think is what the XXX comment "isn't great" was referring to. The new coding is more mysterious (does it do the right thing on all platforms?), so I guess the original is still the way to go unless we get a better idea.
--
John Naylor
EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: