Re: Question about the Implementation of vector32_is_highbit_set on ARM
От | John Naylor |
---|---|
Тема | Re: Question about the Implementation of vector32_is_highbit_set on ARM |
Дата | |
Msg-id | CANWCAZZj1Vn8Ee0JoZj-4ZvE48YrKYnFh-P9OcsUkeBrj62p6g@mail.gmail.com обсуждение исходный текст |
Ответ на | Question about the Implementation of vector32_is_highbit_set on ARM (Xiang Gao <Xiang.Gao@arm.com>) |
Ответы |
RE: Question about the Implementation of vector32_is_highbit_set on ARM
|
Список | pgsql-hackers |
On Wed, Nov 8, 2023 at 2:44 PM Xiang Gao <Xiang.Gao@arm.com> wrote: > * function. We could instead adopt the behavior of Arm's vmaxvq_u32(), i.e. > * check each 32-bit element, but that would require an additional mask > * operation on x86. > */ > But I still don't understand why the vmaxvq_u32 intrinsic is not used on the arm platform. The current use case expects all 1's or all 0's in a 32-bit lane. If anyone tried using it for arbitrary values, vmaxvq_u32 could give a different answer than on x86 using _mm_movemask_epi8, so I think that's the origin of that comment. But it's still a maintenance hazard as is, since x86 wouldn't work for arbitrary values. It seems the path forward is to rename this function to vector32_is_any_lane_set(), as in the attached (untested on Arm). That would allow each implementation to use the most efficient path, whether it's by 8- or 32-bit lanes. If we someday needed to look at only the high bits, we would need a new function that performed the necessary masking on x86. It's possible this method could shave cycles on Arm in some 8-bit lane cases where we don't actually care about the high bit specifically, since the movemask equivalent is slow on that platform, but I haven't looked yet.
Вложения
В списке pgsql-hackers по дате отправления: