Re: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Nathan Bossart
Тема Re: Popcount optimization using AVX512
Дата
Msg-id 20240328215136.GA918358@nathanxps13
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
On Thu, Mar 28, 2024 at 04:38:54PM -0500, Nathan Bossart wrote:
> Here is a v14 of the patch that I think is beginning to approach something
> committable.  Besides general review and testing, there are two things that
> I'd like to bring up:
> 
> * The latest patch set from Paul Amonson appeared to support MSVC in the
>   meson build, but not the autoconf one.  I don't have much expertise here,
>   so the v14 patch doesn't have any autoconf/meson support for MSVC, which
>   I thought might be okay for now.  IIUC we assume that 64-bit/MSVC builds
>   can always compile the x86_64 popcount code, but I don't know whether
>   that's safe for AVX512.
> 
> * I think we need to verify there isn't a huge performance regression for
>   smaller arrays.  IIUC those will still require an AVX512 instruction or
>   two as well as a function call, which might add some noticeable overhead.

I forgot to mention that I also want to understand whether we can actually
assume availability of XGETBV when CPUID says we support AVX512:

> +        /*
> +         * We also need to check that the OS has enabled support for the ZMM
> +         * registers.
> +         */
> +#ifdef _MSC_VER
> +        return (_xgetbv(0) & 0xe0) != 0;
> +#else
> +        uint64        xcr = 0;
> +        uint32        high;
> +        uint32        low;
> +
> +__asm__ __volatile__(" xgetbv\n":"=a"(low), "=d"(high):"c"(xcr));
> +        return (low & 0xe0) != 0;
> +#endif

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: pg_upgrade --copy-file-range
Следующее
От: "Amonson, Paul D"
Дата:
Сообщение: RE: Popcount optimization using AVX512