Обсуждение: Experimental dynamic memory allocation of postgresql shared memory

Поиск
Список
Период
Сортировка

Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
Hi all,

I have some very experimental code to enable dynamic memory allocation
of shared memory for postgresql backend processes. The source code in
the repository is not complete yet. Moreover it is not immediately
useful by itself. However it might serve as the basis to implement
higher-level features. Such as expanding hash-tables or other data
structures to share data between backends. Ultimately it might be used
for an in-memory data store usable via FDW interface. Despite such
higher level features are not available yet the code anyway might be
interesting for curious eyes.

https://github.com/ademakov/sharena

The first stage of this project was funded by Postgres Pro. Many
thanks to this wonderful team.

Regards,
Aleksey



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Tom Lane
Дата:
Aleksey Demakov <ademakov@gmail.com> writes:
> I have some very experimental code to enable dynamic memory allocation
> of shared memory for postgresql backend processes.

Um ... what's this do that the existing DSM stuff doesn't do?
        regards, tom lane



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Robert Haas
Дата:
On Fri, Jun 17, 2016 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aleksey Demakov <ademakov@gmail.com> writes:
>> I have some very experimental code to enable dynamic memory allocation
>> of shared memory for postgresql backend processes.
>
> Um ... what's this do that the existing DSM stuff doesn't do?

It seems to be a full-fledged allocator, rather than just a way of
getting a slab of bytes from the operating system.  Think malloc()
rather than sbrk().

But I'm a bit confused about where it gets the bytes it wants to
manage.  There's no call to dsm_create() or ShmemAlloc() anywhere in
the code, at least not that I could find quickly.  The only way to get
shar_base set to a non-NULL value seems to be to call SharAttach(),
and if there's no SharCreate() where would we get that non-NULL value?

EnterpriseDB is working on a memory allocator which will manage chunks
of dynamic shared memory and provide an allocate/free interface to
allow small allocations to be carved out of large DSM segments:

https://wiki.postgresql.org/wiki/EnterpriseDB_database_server_roadmap

I expect that to be useful for parallel query and anything else where
processes need to share variable-size data.  However, that's different
from this because ours can grown to arbitrary size and shrink again by
allocating and freeing with DSM segments.  We also do everything with
relative pointers since DSM segments can be mapped at different
addresses in different processes, whereas this would only work with
memory carved out of the main shared memory segment (or some new DSM
facility that guaranteed identical placement in every address space).

I expect we'll probably post our implementation of this shortly after
9.7 development opens.  I've been a bit reluctant to put it out there
until we have a tangible application of the allocator working, for
fear people will say "that's not good for anything!".  I'm confident
it's good for lots of things, but other people have been known not to
share my confidence.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
On Fri, Jun 17, 2016 at 9:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aleksey Demakov <ademakov@gmail.com> writes:
>> I have some very experimental code to enable dynamic memory allocation
>> of shared memory for postgresql backend processes.
>
> Um ... what's this do that the existing DSM stuff doesn't do?
>

It operates over a single large shared memory segment. Within this
segment it lets alloc / free small chunks of memory from 16 bytes to
16 kilobytes. Chunks are carved out from fixed-size 32k blocks. Each
block is used to allocate chunks of single size class. When a block is
full, another block for a given size class is taken from the top
shared segment.

The goal is to support high levels of concurrency for alloc / free
calls. Therefore the allocator is mostly non-blocking. Currently it
uses Heller's lazy list algorithm to maintain block lists of a given
size class, so it uses slocks once in a while, when a new block is
added or removed. If this proves to cause scalability problems the
Heller's list might be replaced with Maged Michael's lock-free list to
make the whole allocator absolutely lock-free.

Additionally it provides epoch-based memory reclamation facility that
solves ABA-problem for lock-free algorithms. I am going to implement
some lock-free algorithms (extendable hash-tables and probably skip
lists) on top of this facility.

Regards,
Aleksey



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
On Fri, Jun 17, 2016 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 11:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> But I'm a bit confused about where it gets the bytes it wants to
> manage.  There's no call to dsm_create() or ShmemAlloc() anywhere in
> the code, at least not that I could find quickly.  The only way to get
> shar_base set to a non-NULL value seems to be to call SharAttach(),
> and if there's no SharCreate() where would we get that non-NULL value?
>

You are right, I just have to tidy up the initialisation code before
publishing it.

> I expect that to be useful for parallel query and anything else where
> processes need to share variable-size data.  However, that's different
> from this because ours can grown to arbitrary size and shrink again by
> allocating and freeing with DSM segments.  We also do everything with
> relative pointers since DSM segments can be mapped at different
> addresses in different processes, whereas this would only work with
> memory carved out of the main shared memory segment (or some new DSM
> facility that guaranteed identical placement in every address space).
>

I believe it would be perfectly okay to allocate huge amount of address
space with mmap on startup.  If the pages are not touched, the OS VM
subsystem will not commit them.

>  I've been a bit reluctant to put it out there
> until we have a tangible application of the allocator working, for
> fear people will say "that's not good for anything!".  I'm confident
> it's good for lots of things, but other people have been known not to
> share my confidence.
>

This is what I've been told by Postgres Pro folks too. But I felt that this
thing deserves to be shown to the community sooner rather than latter.

Regards,
Aleksey



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Robert Haas
Дата:
On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
>> I expect that to be useful for parallel query and anything else where
>> processes need to share variable-size data.  However, that's different
>> from this because ours can grown to arbitrary size and shrink again by
>> allocating and freeing with DSM segments.  We also do everything with
>> relative pointers since DSM segments can be mapped at different
>> addresses in different processes, whereas this would only work with
>> memory carved out of the main shared memory segment (or some new DSM
>> facility that guaranteed identical placement in every address space).
>>
>
> I believe it would be perfectly okay to allocate huge amount of address
> space with mmap on startup.  If the pages are not touched, the OS VM
> subsystem will not commit them.

In my opinion, that's not going to fly.  If I thought otherwise, I
would not have developed the DSM facility in the first place.

First, the behavior in this area is highly dependent on choice of
operating system and configuration parameters.  We've had plenty of
experience with requiring non-default configuration parameters to run
PostgreSQL, and it's all bad.  I don't really want to have to tell
users that they must run with a particular value of
vm.overcommit_memory in order to run the server.  Nor do I want to
tell users of other operating systems that their ability to run
PostgreSQL is dependent on the behavior their OS has in this area.  I
had a MacBook Pro up until a year or two ago where a sufficiently
shared memory request would cause a kernel panic.  That bug will
probably be fixed at some point if it hasn't been already, but
probably by returning an error rather than making it work.

Second, there's no way to give memory back once you've touched it.  If
you decide to do a hash join on a 250GB inner table using a shared
hash table, you're going to have 250GB in swap-backed pages floating
around when you're done.  If the user has swap configured (and more
and more people don't), the operating system will eventually page
those out, but until that happens those pages are reducing the amount
of page cache that's available, and after it happens they're using up
swap.  In either case, the space consumed is consumed to no purpose.
You don't care about that hash table any more once the query
completes; there's just no way to tell the operating system that.  If
your workload follows an entirely predictable pattern and you always
have about the same amount of usage of this facility then you can just
reuse the same pages and everything is fine.  But if your usage
fluctuates I believe it will be a big problem.  With DSM, we can and
do explicitly free the memory back to the OS as soon as we don't need
it any more - and that's a big benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
>>> I expect that to be useful for parallel query and anything else where
>>> processes need to share variable-size data.  However, that's different
>>> from this because ours can grown to arbitrary size and shrink again by
>>> allocating and freeing with DSM segments.  We also do everything with
>>> relative pointers since DSM segments can be mapped at different
>>> addresses in different processes, whereas this would only work with
>>> memory carved out of the main shared memory segment (or some new DSM
>>> facility that guaranteed identical placement in every address space).
>>>
>>
>> I believe it would be perfectly okay to allocate huge amount of address
>> space with mmap on startup.  If the pages are not touched, the OS VM
>> subsystem will not commit them.
>
> In my opinion, that's not going to fly.  If I thought otherwise, I
> would not have developed the DSM facility in the first place.
>
> First, the behavior in this area is highly dependent on choice of
> operating system and configuration parameters.  We've had plenty of
> experience with requiring non-default configuration parameters to run
> PostgreSQL, and it's all bad.  I don't really want to have to tell
> users that they must run with a particular value of
> vm.overcommit_memory in order to run the server.  Nor do I want to
> tell users of other operating systems that their ability to run
> PostgreSQL is dependent on the behavior their OS has in this area.  I
> had a MacBook Pro up until a year or two ago where a sufficiently
> shared memory request would cause a kernel panic.  That bug will
> probably be fixed at some point if it hasn't been already, but
> probably by returning an error rather than making it work.
>
> Second, there's no way to give memory back once you've touched it.  If
> you decide to do a hash join on a 250GB inner table using a shared
> hash table, you're going to have 250GB in swap-backed pages floating
> around when you're done.  If the user has swap configured (and more
> and more people don't), the operating system will eventually page
> those out, but until that happens those pages are reducing the amount
> of page cache that's available, and after it happens they're using up
> swap.  In either case, the space consumed is consumed to no purpose.
> You don't care about that hash table any more once the query
> completes; there's just no way to tell the operating system that.  If
> your workload follows an entirely predictable pattern and you always
> have about the same amount of usage of this facility then you can just
> reuse the same pages and everything is fine.  But if your usage
> fluctuates I believe it will be a big problem.  With DSM, we can and
> do explicitly free the memory back to the OS as soon as we don't need
> it any more - and that's a big benefit.
>

Essentially this is pessimizing for the lowest common denominator
among OSes. Having a contiguous address space makes things so
much simpler that considering this case, IMHO, is well worth of it.

You are right that this might highly depend on the OS. But you are
only partially right that it's impossible to give the memory back once
you touched it. It is possible in many cases with additional measures.
That is with additional control over memory mapping. Surprisingly, in
this case windows has the most straightforward solution. VirtualAlloc
has separate MEM_RESERVE and MEM_COMMIT flags. On various
Unix flavours it is possible to play with mmap MAP_NORESERVE
flag and madvise syscall. Finally, it's possible to repeatedly mmap
and munmap on portions of a contiguous address space providing
a given addr argument for both of them. The last option might, of
course, is susceptible to hijacking this portion of the address by an
inadvertent caller of mmap with NULL addr argument. But probably
this could be avoided by imposing a disciplined use of mmap in
postgresql core and extensions.

Thus providing a single contiguous shared address space is doable.
The other question is how much it would buy. As for development
time of an allocator it is a clear win. In terms of easy passing direct
memory pointers between backends this a clear win again.

In terms of resulting performance, I don't know. This would take
a few cycles on every step. You have a shared hash table. You
cannot keep pointers there. You need to store offsets against the
base address. Any reference would involve additional arithmetics.
When these things add up, the net effect might become noticeable.

Regards,
Aleksey



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Andres Freund
Дата:
On 2016-06-18 00:23:14 +0600, Aleksey Demakov wrote:
> Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.

I don't think that's particularly realistic. malloc() uses mmap(NULL)
internally.  And you can't portably mmap non-file backed memory from
different processes; you need something like tmpfs backed / posix shared
memory / for it.  On linux you can do stuff like madvise(MADV_FREE),
which kinda helps.



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
Sorry for unclear language. Late Friday evening in my place is to blame.

On Sat, Jun 18, 2016 at 12:23 AM, Aleksey Demakov <ademakov@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
>>>> I expect that to be useful for parallel query and anything else where
>>>> processes need to share variable-size data.  However, that's different
>>>> from this because ours can grown to arbitrary size and shrink again by
>>>> allocating and freeing with DSM segments.  We also do everything with
>>>> relative pointers since DSM segments can be mapped at different
>>>> addresses in different processes, whereas this would only work with
>>>> memory carved out of the main shared memory segment (or some new DSM
>>>> facility that guaranteed identical placement in every address space).
>>>>
>>>
>>> I believe it would be perfectly okay to allocate huge amount of address
>>> space with mmap on startup.  If the pages are not touched, the OS VM
>>> subsystem will not commit them.
>>
>> In my opinion, that's not going to fly.  If I thought otherwise, I
>> would not have developed the DSM facility in the first place.
>>
>> First, the behavior in this area is highly dependent on choice of
>> operating system and configuration parameters.  We've had plenty of
>> experience with requiring non-default configuration parameters to run
>> PostgreSQL, and it's all bad.  I don't really want to have to tell
>> users that they must run with a particular value of
>> vm.overcommit_memory in order to run the server.  Nor do I want to
>> tell users of other operating systems that their ability to run
>> PostgreSQL is dependent on the behavior their OS has in this area.  I
>> had a MacBook Pro up until a year or two ago where a sufficiently
>> shared memory request would cause a kernel panic.  That bug will
>> probably be fixed at some point if it hasn't been already, but
>> probably by returning an error rather than making it work.
>>
>> Second, there's no way to give memory back once you've touched it.  If
>> you decide to do a hash join on a 250GB inner table using a shared
>> hash table, you're going to have 250GB in swap-backed pages floating
>> around when you're done.  If the user has swap configured (and more
>> and more people don't), the operating system will eventually page
>> those out, but until that happens those pages are reducing the amount
>> of page cache that's available, and after it happens they're using up
>> swap.  In either case, the space consumed is consumed to no purpose.
>> You don't care about that hash table any more once the query
>> completes; there's just no way to tell the operating system that.  If
>> your workload follows an entirely predictable pattern and you always
>> have about the same amount of usage of this facility then you can just
>> reuse the same pages and everything is fine.  But if your usage
>> fluctuates I believe it will be a big problem.  With DSM, we can and
>> do explicitly free the memory back to the OS as soon as we don't need
>> it any more - and that's a big benefit.
>>
>
> Essentially this is pessimizing for the lowest common denominator
> among OSes. Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.
>
> You are right that this might highly depend on the OS. But you are
> only partially right that it's impossible to give the memory back once
> you touched it. It is possible in many cases with additional measures.
> That is with additional control over memory mapping. Surprisingly, in
> this case windows has the most straightforward solution. VirtualAlloc
> has separate MEM_RESERVE and MEM_COMMIT flags. On various
> Unix flavours it is possible to play with mmap MAP_NORESERVE
> flag and madvise syscall. Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.
>
> Thus providing a single contiguous shared address space is doable.
> The other question is how much it would buy. As for development
> time of an allocator it is a clear win. In terms of easy passing direct
> memory pointers between backends this a clear win again.
>
> In terms of resulting performance, I don't know. This would take
> a few cycles on every step. You have a shared hash table. You
> cannot keep pointers there. You need to store offsets against the
> base address. Any reference would involve additional arithmetics.
> When these things add up, the net effect might become noticeable.
>
> Regards,
> Aleksey



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
On Sat, Jun 18, 2016 at 12:31 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-06-18 00:23:14 +0600, Aleksey Demakov wrote:
>> Finally, it's possible to repeatedly mmap
>> and munmap on portions of a contiguous address space providing
>> a given addr argument for both of them. The last option might, of
>> course, is susceptible to hijacking this portion of the address by an
>> inadvertent caller of mmap with NULL addr argument. But probably
>> this could be avoided by imposing a disciplined use of mmap in
>> postgresql core and extensions.
>
> I don't think that's particularly realistic. malloc() uses mmap(NULL)
> internally.  And you can't portably mmap non-file backed memory from
> different processes; you need something like tmpfs backed / posix shared
> memory / for it.  On linux you can do stuff like madvise(MADV_FREE),
> which kinda helps.

Oops. Agreed.



Re: Experimental dynamic memory allocation of postgresql shared memory

От
"David G. Johnston"
Дата:
On Fri, Jun 17, 2016 at 2:23 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
>>> I expect that to be useful for parallel query and anything else where
>>> processes need to share variable-size data.  However, that's different
>>> from this because ours can grown to arbitrary size and shrink again by
>>> allocating and freeing with DSM segments.  We also do everything with
>>> relative pointers since DSM segments can be mapped at different
>>> addresses in different processes, whereas this would only work with
>>> memory carved out of the main shared memory segment (or some new DSM
>>> facility that guaranteed identical placement in every address space).
>>>
>>
>> I believe it would be perfectly okay to allocate huge amount of address
>> space with mmap on startup.  If the pages are not touched, the OS VM
>> subsystem will not commit them.
>
> In my opinion, that's not going to fly.  If I thought otherwise, I
> would not have developed the DSM facility in the first place.
>
> First, the behavior in this area is highly dependent on choice of
> operating system and configuration parameters.  We've had plenty of
> experience with requiring non-default configuration parameters to run
> PostgreSQL, and it's all bad.  I don't really want to have to tell
> users that they must run with a particular value of
> vm.overcommit_memory in order to run the server.  Nor do I want to
> tell users of other operating systems that their ability to run
> PostgreSQL is dependent on the behavior their OS has in this area.  I
> had a MacBook Pro up until a year or two ago where a sufficiently
> shared memory request would cause a kernel panic.  That bug will
> probably be fixed at some point if it hasn't been already, but
> probably by returning an error rather than making it work.
>
> Second, there's no way to give memory back once you've touched it.  If
> you decide to do a hash join on a 250GB inner table using a shared
> hash table, you're going to have 250GB in swap-backed pages floating
> around when you're done.  If the user has swap configured (and more
> and more people don't), the operating system will eventually page
> those out, but until that happens those pages are reducing the amount
> of page cache that's available, and after it happens they're using up
> swap.  In either case, the space consumed is consumed to no purpose.
> You don't care about that hash table any more once the query
> completes; there's just no way to tell the operating system that.  If
> your workload follows an entirely predictable pattern and you always
> have about the same amount of usage of this facility then you can just
> reuse the same pages and everything is fine.  But if your usage
> fluctuates I believe it will be a big problem.  With DSM, we can and
> do explicitly free the memory back to the OS as soon as we don't need
> it any more - and that's a big benefit.
>

Essentially this is pessimizing for the lowest common denominator
among OSes. Having a contiguous address space makes things so
much simpler that considering this case, IMHO, is well worth of it.


​Given PostgreSQL's goals regarding multi-platform operation it would seem that at minimum there needs to be an implementation available that indeed has these properties.  Improving our current base implementation within these guidelines would be nice since everyone would benefit from the work and the net amount of code is going to be reasonable since the old stuff will likely be removed while the new stuff is being added.

While platform dependent default configuration parameters are undesirable​ enabling better but less widely usable algorithms seems to be one use for compile-time options.  Is this arena amenable to such swapping out of behavior at compile time?

​David J.​

Re: Experimental dynamic memory allocation of postgresql shared memory

От
Robert Haas
Дата:
On Fri, Jun 17, 2016 at 2:23 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
> Essentially this is pessimizing for the lowest common denominator
> among OSes.

I totally agree.  That's how we make the server portable.

> Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.

I think that would be great if you could make it work, but it has to
support Linux, Windows (all supported versions), MacOS X, all the
various BSD flavors for which we have buildfarm animals, and other
platforms that we currently run on like HP-UX.   If you come up with a
solution that works for this on all of those platforms, I will shake
your hand.  But I think that's probably impossible, or at least
really, really hard.

> You are right that this might highly depend on the OS. But you are
> only partially right that it's impossible to give the memory back once
> you touched it. It is possible in many cases with additional measures.
> That is with additional control over memory mapping. Surprisingly, in
> this case windows has the most straightforward solution. VirtualAlloc
> has separate MEM_RESERVE and MEM_COMMIT flags. On various
> Unix flavours it is possible to play with mmap MAP_NORESERVE
> flag and madvise syscall. Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.

I have never understood how mmap() with a non-NULL argument could be
anything but a giant foot-gun.  If the operation system positions a
shared library or your process stack or anything else in the chosen
address range, you are dead.  I do agree that there are a bunch of
other tools that could be used on various platforms, but the need to
have a cross-platform solution for anything that goes into core makes
this very hard.

> Thus providing a single contiguous shared address space is doable.

Not convinced.

> The other question is how much it would buy. As for development
> time of an allocator it is a clear win. In terms of easy passing direct
> memory pointers between backends this a clear win again.

I agree it would be a huge win if it could be done.

> In terms of resulting performance, I don't know. This would take
> a few cycles on every step. You have a shared hash table. You
> cannot keep pointers there. You need to store offsets against the
> base address. Any reference would involve additional arithmetics.
> When these things add up, the net effect might become noticeable.

I'm sure it's going to be somewhat slower, but I think that's just a
tax that we have to pay for using processes rather than threads.  I
think it's still going to be fast enough to do plenty of cool stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Tom Lane
Дата:
Aleksey Demakov <ademakov@gmail.com> writes:
> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
>>> I believe it would be perfectly okay to allocate huge amount of address
>>> space with mmap on startup.  If the pages are not touched, the OS VM
>>> subsystem will not commit them.

>> In my opinion, that's not going to fly.  If I thought otherwise, I
>> would not have developed the DSM facility in the first place.

> Essentially this is pessimizing for the lowest common denominator
> among OSes.

You're right, but that doesn't mean that the community is going to take
much interest in an unportable replacement for code that already exists.
Especially not an unportable replacement that also needs sweeping
assumptions like "disciplined use of mmap in postgresql core and
extensions".  You don't have to look further than the availability of
mmap to plperlu programmers to realize that that won't fly.  (Even if
we threw all the untrusted PLs overboard, I believe plain old stdio
is willing to use mmap in many versions of libc.)
        regards, tom lane



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
On Sat, Jun 18, 2016 at 12:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aleksey Demakov <ademakov@gmail.com> writes:
>> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> In my opinion, that's not going to fly.  If I thought otherwise, I
>>> would not have developed the DSM facility in the first place.
>
>> Essentially this is pessimizing for the lowest common denominator
>> among OSes.
>
> You're right, but that doesn't mean that the community is going to take
> much interest in an unportable replacement for code that already exists.

Excuse me, what code already exists? As far as I understand, we
compare the approach taken in my code against Robert's code that
is not yet available to the community.

Discussing DSM is beyond the point.

My code might be smoothly hooked into the existing system from an
extension module just with a couple of calls:

RequestAddinShmemSpace() and ShmemInitStruct().

After that this extension might use my concurrent memory allocator
and safe memory reclamation for implementing highly optimized
concurrent data structures of their choice. E.g. concurrent data
structures that I am going to add to the package in the future.

All in all, currently this is not a replacement for anything. This is an
experimental add-on and a food for thought for interested people.

Integrating my code right into the core to replace anything there is
a very remote possibility. I understand if it ever happens it would
take very serious work and multiple iterations.

> Especially not an unportable replacement that also needs sweeping
> assumptions like "disciplined use of mmap in postgresql core and
> extensions".  You don't have to look further than the availability of
> mmap to plperlu programmers to realize that that won't fly.  (Even if
> we threw all the untrusted PLs overboard, I believe plain old stdio
> is willing to use mmap in many versions of libc.)
>

Sorry. I made a sloppy statement about mmap/munmap use. As
correctly pointed out by Andres Freund, it is problematic. So the
whole line about "disciplined use of mmap in postgresql core and
extensions" goes away. Forget it.

But the other techniques that I mentioned do not take such a
special discipline.

The corrected statement is that a single contiguous shared space
is practically doable on many platforms with some effort. And this
approach would make implementation of many shared data
structures more efficient.

Furthermore, I'd guess there is no much point to enable parallel
query execution on a macbook. Or at least one wouldn't expect
superb results from this anyway.

I'd make a wild claim that users who would benefit from parallel
queries or my concurrency work most of the time are the same
users who run platforms that can support single address space.

Thus if there is a solution that benefits e.g. 95% of target users
then why refrain from it in the name of the other 5%? Should not
the support of those 5% be treated as a lower-priority fallback,
while the main effort be put on optimizing for 95-percenters?

Regards,
Aleksey



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Tom Lane
Дата:
Aleksey Demakov <ademakov@gmail.com> writes:
> On Sat, Jun 18, 2016 at 12:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You're right, but that doesn't mean that the community is going to take
>> much interest in an unportable replacement for code that already exists.

> Excuse me, what code already exists? As far as I understand, we
> compare the approach taken in my code against Robert's code that
> is not yet available to the community.

DSM already exists, and for many purposes its lack of a
within-a-shmem-segment dynamic allocator is irrelevant; the same purpose
is served (with more speed, more reliability, and less code) by releasing
the whole DSM segment when no longer needed.  The DSM segment effectively
acts like a memory context, saving code from having to account precisely
for every single allocation it makes.

I grant that having a dynamic allocator added to DSM will support even
more use-cases.  What I'm not convinced of is that we need a dynamic
allocator within the fixed-size shmem segment.  Robert already listed some
reasons why that's rather dubious, but I'll add one more: any leak becomes
a really serious bug, because the only way to recover the space is to
restart the whole database instance.
        regards, tom lane



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Aleksey Demakov
Дата:
On Sat, Jun 18, 2016 at 3:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> DSM already exists, and for many purposes its lack of a
> within-a-shmem-segment dynamic allocator is irrelevant; the same purpose
> is served (with more speed, more reliability, and less code) by releasing
> the whole DSM segment when no longer needed.  The DSM segment effectively
> acts like a memory context, saving code from having to account precisely
> for every single allocation it makes.
>
> I grant that having a dynamic allocator added to DSM will support even
> more use-cases.  What I'm not convinced of is that we need a dynamic
> allocator within the fixed-size shmem segment.  Robert already listed some
> reasons why that's rather dubious, but I'll add one more: any leak becomes
> a really serious bug, because the only way to recover the space is to
> restart the whole database instance.
>

Okay, if you say that DSM segments work the best for accumulating
transient data that may be freed together when it becomes unnecessary
at once, then I agree with that.

My code is for long-living data that could be allocated and freed
chunk by chunk. As if an extension wants to store more data and in
more complicated fashion than fits to an ordinary dynahash with the
HASH_SHARED_MEM flag.

Regards,
Aleksey



Re: Experimental dynamic memory allocation of postgresql shared memory

От
Craig Ringer
Дата:
On 18 June 2016 at 02:42, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jun 17, 2016 at 2:23 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
> Essentially this is pessimizing for the lowest common denominator
> among OSes.

I totally agree.  That's how we make the server portable.

> Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.

I think that would be great if you could make it work, but it has to
support Linux, Windows (all supported versions), MacOS X, all the
various BSD flavors for which we have buildfarm animals, and other
platforms that we currently run on like HP-UX.   If you come up with a
solution that works for this on all of those platforms, I will shake
your hand.  But I think that's probably impossible, or at least
really, really hard.

Indeed. In particular, ASLR on Windows or anywhere we EXEC_BACKEND will cause difficuties attaching to those segments.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Experimental dynamic memory allocation of postgresql shared memory

От
Michael Paquier
Дата:
On Mon, Jun 20, 2016 at 12:40 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 18 June 2016 at 02:42, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Fri, Jun 17, 2016 at 2:23 PM, Aleksey Demakov <ademakov@gmail.com>
>> wrote:
>> > Essentially this is pessimizing for the lowest common denominator
>> > among OSes.
>>
>> I totally agree.  That's how we make the server portable.
>>
>> > Having a contiguous address space makes things so
>> > much simpler that considering this case, IMHO, is well worth of it.
>>
>> I think that would be great if you could make it work, but it has to
>> support Linux, Windows (all supported versions), MacOS X, all the
>> various BSD flavors for which we have buildfarm animals, and other
>> platforms that we currently run on like HP-UX.   If you come up with a
>> solution that works for this on all of those platforms, I will shake
>> your hand.  But I think that's probably impossible, or at least
>> really, really hard.
>
>
> Indeed. In particular, ASLR on Windows or anywhere we EXEC_BACKEND will
> cause difficulties attaching to those segments.

ASLR that we currently disable in the build because Win8/2k12 and
newer versions behaves differently than past OSes in the address
mapping, making the problem even harder if we'd want to have both
working.
-- 
Michael