Re: Optimize Arm64 crc32c implementation in Postgresql
От | Heikki Linnakangas |
---|---|
Тема | Re: Optimize Arm64 crc32c implementation in Postgresql |
Дата | |
Msg-id | e3a105f2-4fa3-802a-5db3-f0e062f61076@iki.fi обсуждение исходный текст |
Ответ на | Re: Optimize Arm64 crc32c implementation in Postgresql (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Optimize Arm64 crc32c implementation in Postgresql
|
Список | pgsql-hackers |
On 03/04/18 19:43, Andres Freund wrote: > Architecture manual time? They're available freely IIRC and should > answer this. Yeah. The best reference I could find was "ARM Cortex-A Series Programmer’s Guide for ARMv8-A" (http://infocenter.arm.com/help/topic/com.arm.doc.den0024a/ch08s01.html). In the "Porting to A64" section, it says: > Data and code must be aligned to appropriate boundaries. The > alignment of accesses can affect performance on ARM cores and can > represent a portability problem when moving code from an earlier > architecture to ARMv8-A. It is worth being aware of alignment issues > for performance reasons, or when porting code that makes assumptions > about pointers or 32-bit and 64-bit integer variables. I was a bit surprised by the "must be aligned to appropriate boundaries" statement. Googling around, the strict alignment requirement was removed in ARMv7, and since then, unaligned access works similarly to Intel. I think there are some special instructions, like atomic ops, that require alignment though. Perhaps that's what that sentence refers to. On 03/04/18 20:47, Tom Lane wrote: > I'm pretty sure that some ARM platforms emulate unaligned access through > kernel trap handlers, which would certainly make this a lot slower than > handling the unaligned bytes manually. Maybe that doesn't apply to any > ARM CPU that has this instruction ... but as you said, it'd be better > to consider the presence of the instruction as orthogonal to other > CPU features. I did some quick testing, and found that unaligned access is about 2x slower than aligned. I don't think it's being trapped by the kernel, I think that would be even slower, but clearly there is an effect there. So I added code to process the first 1-7 bytes separately, so that the main loop runs on 8-byte aligned addresses. Pushed, thanks everyone! - Heikki
В списке pgsql-hackers по дате отправления: