RE: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Shankaran, Akash
Тема RE: Popcount optimization using AVX512
Дата
Msg-id PH0PR11MB50007F79C92E3B0C7C1E6D6FF20E2@PH0PR11MB5000.namprd11.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
> It was brought to my attention [0] that we probably should be checking for the OSXSAVE bit instead of the XSAVE bit
whendetermining whether there's support for the XGETBV instruction.  IIUC that should indicate that both the OS and the
processorhave XGETBV support (not just the processor). 
> I've attached a one-line patch to fix this.

> [0] https://github.com/pgvector/pgvector/pull/519#issuecomment-2062804463

Good find. I confirmed after speaking with an intel expert, and from the intel AVX-512 manual [0] section 14.3, which
recommendsto check bit27. From the manual: 

"Prior to using Intel AVX, the application must identify that the operating system supports the XGETBV instruction,
the YMM register state, in addition to processor's support for YMM state management using XSAVE/XRSTOR and
AVX instructions. The following simplified sequence accomplishes both and is strongly recommended.
1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use1).
2) Issue XGETBV and verify that XCR0[2:1] = '11b' (XMM state and YMM state are enabled by OS).
3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
(Step 3 can be done in any order relative to 1 and 2.)"

It also seems that step 1 and step 2 need to be done prior to the CPUID OSXSAVE check in the popcount code.

[0]: https://cdrdv2.intel.com/v1/dl/getContent/671200

- Akash Shankaran




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Add notes to pg_combinebackup docs
Следующее
От: Kirk Wolak
Дата:
Сообщение: Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [ NOT Fixed ]