Re: Experimental dynamic memory allocation of postgresql shared memory

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Experimental dynamic memory allocation of postgresql shared memory
Дата
Msg-id CAKFQuwY7-uCGHkQ0a9LkFB8S1n9ovba7yjsXdagZ0oXVJH7RNA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Experimental dynamic memory allocation of postgresql shared memory  (Aleksey Demakov <ademakov@gmail.com>)
Список pgsql-hackers
On Fri, Jun 17, 2016 at 2:23 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <ademakov@gmail.com> wrote:
>>> I expect that to be useful for parallel query and anything else where
>>> processes need to share variable-size data.  However, that's different
>>> from this because ours can grown to arbitrary size and shrink again by
>>> allocating and freeing with DSM segments.  We also do everything with
>>> relative pointers since DSM segments can be mapped at different
>>> addresses in different processes, whereas this would only work with
>>> memory carved out of the main shared memory segment (or some new DSM
>>> facility that guaranteed identical placement in every address space).
>>>
>>
>> I believe it would be perfectly okay to allocate huge amount of address
>> space with mmap on startup.  If the pages are not touched, the OS VM
>> subsystem will not commit them.
>
> In my opinion, that's not going to fly.  If I thought otherwise, I
> would not have developed the DSM facility in the first place.
>
> First, the behavior in this area is highly dependent on choice of
> operating system and configuration parameters.  We've had plenty of
> experience with requiring non-default configuration parameters to run
> PostgreSQL, and it's all bad.  I don't really want to have to tell
> users that they must run with a particular value of
> vm.overcommit_memory in order to run the server.  Nor do I want to
> tell users of other operating systems that their ability to run
> PostgreSQL is dependent on the behavior their OS has in this area.  I
> had a MacBook Pro up until a year or two ago where a sufficiently
> shared memory request would cause a kernel panic.  That bug will
> probably be fixed at some point if it hasn't been already, but
> probably by returning an error rather than making it work.
>
> Second, there's no way to give memory back once you've touched it.  If
> you decide to do a hash join on a 250GB inner table using a shared
> hash table, you're going to have 250GB in swap-backed pages floating
> around when you're done.  If the user has swap configured (and more
> and more people don't), the operating system will eventually page
> those out, but until that happens those pages are reducing the amount
> of page cache that's available, and after it happens they're using up
> swap.  In either case, the space consumed is consumed to no purpose.
> You don't care about that hash table any more once the query
> completes; there's just no way to tell the operating system that.  If
> your workload follows an entirely predictable pattern and you always
> have about the same amount of usage of this facility then you can just
> reuse the same pages and everything is fine.  But if your usage
> fluctuates I believe it will be a big problem.  With DSM, we can and
> do explicitly free the memory back to the OS as soon as we don't need
> it any more - and that's a big benefit.
>

Essentially this is pessimizing for the lowest common denominator
among OSes. Having a contiguous address space makes things so
much simpler that considering this case, IMHO, is well worth of it.


​Given PostgreSQL's goals regarding multi-platform operation it would seem that at minimum there needs to be an implementation available that indeed has these properties.  Improving our current base implementation within these guidelines would be nice since everyone would benefit from the work and the net amount of code is going to be reasonable since the old stuff will likely be removed while the new stuff is being added.

While platform dependent default configuration parameters are undesirable​ enabling better but less widely usable algorithms seems to be one use for compile-time options.  Is this arena amenable to such swapping out of behavior at compile time?

​David J.​

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Experimental dynamic memory allocation of postgresql shared memory