On Fri, Feb 24, 2017 at 08:31:09PM +0200, Ants Aasma wrote:
> >> We looked at that when picking the algorithm. At that point it seemed
> >> that CRC CPU instructions were not universal enough to rely on them.
> >> The algorithm we ended up on was designed to be fast on SIMD hardware.
> >> Unfortunately on x86-64 that required SSE4.1 integer instructions, so
> >> with default compiles there is a lot of performance left on table. A
> >> low hanging fruit would be to do CPU detection like the CRC case and
> >> enable a SSE4.1 optimized variant when those instructions are
> >> available. IIRC it was actually a lot faster than the naive hardware
> >> CRC that is used for WAL and about on par with interleaved CRC.
> >
> > Uh, I thought already did compile-time testing for SSE4.1 and used them
> > if present. Why do you say "with default compiles there is a lot of
> > performance left on table?"
>
> Compile time checks don't help because the compiled binary could be
> run on a different host that does not have SSE4.1 (as extremely
> unlikely as it is at this point of time). A runtime check is done for
Right.
> WAL checksums that use a special CRC32 instruction. Block checksums
> predate that and use a different algorithm that was picked because it
> could be accelerated with vectorized execution on non-Intel
> architectures. We just never got around to adding runtime checks for
> the architecture to enable this speedup.
Oh, that's why we will hopefully eventually change the page checksum
algorithm to use the special CRC32 instruction, and set a new checksum
version --- got it. I assume there is currently no compile-time way to
do this.
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +