Re: Popcount optimization using AVX512
От | Alvaro Herrera |
---|---|
Тема | Re: Popcount optimization using AVX512 |
Дата | |
Msg-id | 202404011106.y4fci35kzdqt@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: Popcount optimization using AVX512 (Nathan Bossart <nathandbossart@gmail.com>) |
Ответы |
Re: Popcount optimization using AVX512
|
Список | pgsql-hackers |
On 2024-Mar-31, Nathan Bossart wrote: > +uint64 > +pg_popcount_avx512(const char *buf, int bytes) > +{ > + uint64 popcnt; > + __m512i accum = _mm512_setzero_si512(); > + > + for (; bytes >= sizeof(__m512i); bytes -= sizeof(__m512i)) > + { > + const __m512i val = _mm512_loadu_si512((const __m512i *) buf); > + const __m512i cnt = _mm512_popcnt_epi64(val); > + > + accum = _mm512_add_epi64(accum, cnt); > + buf += sizeof(__m512i); > + } > + > + popcnt = _mm512_reduce_add_epi64(accum); > + return popcnt + pg_popcount_fast(buf, bytes); > +} Hmm, doesn't this arrangement cause an extra function call to pg_popcount_fast to be used here? Given the level of micro-optimization being used by this code, I would have thought that you'd have tried to avoid that. (At least, maybe avoid the call if bytes is 0, no?) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte" (Ijon Tichy en Viajes, Stanislaw Lem)
В списке pgsql-hackers по дате отправления: