Обсуждение: Memory buffer alignment

Поиск
Список
Период
Сортировка

Memory buffer alignment

От
Manfred Spraul
Дата:
Hi,

When analyzing the kernel profile from osdl dbt benchmarks, I noticed 
that around 50% of the kernel time is spent in __copy_user_intel.
http://khack.osdl.org/stp/280060/profile/

This function is one of two functions that does the actual memory copy 
from/to kernel space to/from user space.
Unfortunately it's the slower one: Intel cpus have a microcode fastpath 
for memcopies that are 8-byte aligned. This fastpath is around 50% 
faster than the manual copy that is used for "misaligned" (i.e. only 
4-byte aligned) pointers. I don't know enough about other cpus, but I'd 
expect that most cpus prefer well-aligned buffers.
How are the user space buffers allocated?
So far I found buffile.c, but "struct BufFile.buffer" is at offset 32, 
i.e. aligned, although by chance. What is the alignment of the output of 
palloc? Is buffile.c the main code that reads/writes data to disk?

--   Manfred



Re: Memory buffer alignment

От
Tom Lane
Дата:
Manfred Spraul <manfred@colorfullife.com> writes:
> Unfortunately it's the slower one: Intel cpus have a microcode fastpath 
> for memcopies that are 8-byte aligned. This fastpath is around 50% 
> faster than the manual copy that is used for "misaligned" (i.e. only 
> 4-byte aligned) pointers.

Maybe it'd be worth setting MAXIMUM_ALIGNOF to 8 on such CPUs?  Or at
least hacking ShmemAlloc and friends to use 8-byte alignment.  I assume
the major issue here is that the shared buffers don't get 8-byte-aligned
within the shared memory segment.

Are there any machines where it'd be worth forcing an even larger
alignment for the buffers?
        regards, tom lane


Re: Memory buffer alignment

От
Bruce Momjian
Дата:
I found this very interested, and realize we have shared buffers aligned
at 8-bytes in CVS.

However, I know if I allocate an 8k block, it will usually be aligned on
an 8k boundary, right?  I know the i386 uses 4k memory pages, and it
certainly seems like it would be a good idea to have the 8k buffers
aligned on 4k offsets.

Can someone run some tests to find out if there is any value to doing 4k
offsets for shared buffer pages?  I am also interested to see if any
speed improvement can be seen with a MAXIMUM_ALIGNOF to 8.

---------------------------------------------------------------------------

Tom Lane wrote:
> Manfred Spraul <manfred@colorfullife.com> writes:
> > Unfortunately it's the slower one: Intel cpus have a microcode fastpath 
> > for memcopies that are 8-byte aligned. This fastpath is around 50% 
> > faster than the manual copy that is used for "misaligned" (i.e. only 
> > 4-byte aligned) pointers.
> 
> Maybe it'd be worth setting MAXIMUM_ALIGNOF to 8 on such CPUs?  Or at
> least hacking ShmemAlloc and friends to use 8-byte alignment.  I assume
> the major issue here is that the shared buffers don't get 8-byte-aligned
> within the shared memory segment.
> 
> Are there any machines where it'd be worth forcing an even larger
> alignment for the buffers?
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Memory buffer alignment

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> it certainly seems like it would be a good idea to have the 8k buffers
> aligned on 4k offsets.

Why?  What mechanism do you expect would find that more efficient?
        regards, tom lane


Re: Memory buffer alignment

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > it certainly seems like it would be a good idea to have the 8k buffers
> > aligned on 4k offsets.
> 
> Why?  What mechanism do you expect would find that more efficient?

There was the idea that some OS's can swap the pages in from kernel into
the user space. I am not sure any one does that, but it would be
interesting to see.  Also, a single shared buffer access would be a
single virtual memory lookup, rather than two lookups.  Not sure, but it
would interesting to see.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073