Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
От | Heikki Linnakangas |
---|---|
Тема | Re: patch: add MAP_HUGETLB to mmap() where supported (WIP) |
Дата | |
Msg-id | 52370415.6060108@vmware.com обсуждение исходный текст |
Ответ на | Re: patch: add MAP_HUGETLB to mmap() where supported (WIP) (Andres Freund <andres@2ndquadrant.com>) |
Ответы |
Re: patch: add MAP_HUGETLB to mmap() where supported (WIP)
Re: patch: add MAP_HUGETLB to mmap() where supported (WIP) |
Список | pgsql-hackers |
On 16.09.2013 13:15, Andres Freund wrote: > On 2013-09-16 11:15:28 +0300, Heikki Linnakangas wrote: >> On 14.09.2013 02:41, Richard Poole wrote: >>> The attached patch adds the MAP_HUGETLB flag to mmap() for shared memory >>> on systems that support it. It's based on Christian Kruse's patch from >>> last year, incorporating suggestions from Andres Freund. >> >> I don't understand the logic in figuring out the pagesize, and the smallest >> supported hugepage size. First of all, even without the patch, why do we >> round up the size passed to mmap() to the _SC_PAGE_SIZE? Surely the kernel >> will round up the request all by itself. The mmap() man page doesn't say >> anything about length having to be a multiple of pages size. > > I think it does: > EINVAL We don't like addr, length, or offset (e.g., they are too > large, or not aligned on a page boundary). That doesn't mean that they *all* have to be aligned on a page boundary. It's understandable that 'addr' and 'offset' have to be, but it doesn't make much sense for 'length'. > and > A file is mapped in multiples of the page size. For a file that is not a multiple > of the page size, the remaining memory is zeroed when mapped, and writes to that > region are not written out to the file. The effect of changing the size of the > underlying file of a mapping on the pages that correspond to added or removed > regions of the file is unspecified. > > And no, according to my past experience, the kernel does *not* do any > such rounding up. It will just fail. I wrote a little test program to play with different values (attached). I tried this on my laptop with a 3.2 kernel (uname -r: 3.10-2-amd6), and on a VM with a fresh Centos 6.4 install with 2.6.32 kernel (2.6.32-358.18.1.el6.x86_64), and they both work the same: $ ./mmaptest 100 # mmap 100 bytes in a different terminal: $ cat /proc/meminfo | grep HugePages_Rsvd HugePages_Rsvd: 1 So even a tiny allocation, much smaller than any page size, succeeds, and it reserves a huge page. I tried the same with larger values; the kernel always uses huge pages, and rounds up the allocation to a multiple of the huge page size. So, let's just get rid of the /sys scanning code. Robert, do you remember why you put the "pagesize = sysconf(_SC_PAGE_SIZE);" call in the new mmap() shared memory allocator? - Heikki
Вложения
В списке pgsql-hackers по дате отправления: