Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
От | Claudio Freire |
---|---|
Тема | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Дата | |
Msg-id | CAGTBQpY4e3na-pnHAnpRapHDgi+EJAbkOgjmrVY_Nkgw5o+ZHQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (Stephen Frost <sfrost@snowman.net>) |
Список | pgsql-hackers |
On Wed, Jan 15, 2014 at 3:41 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Claudio Freire (klaussfreire@gmail.com) wrote: >> But, still, the implementation is very similar to what postgres needs: >> sharing a physical page for two distinct logical pages, efficiently, >> with efficient copy-on-write. > > Agreed, except that KSM seems like it'd be slow/lazy about it and I'm > guessing there's a reason the pagecache isn't included normally.. KSM does an active de-duplication. That's slow. This would be leveraging KSM structures in the kernel (page sharing) but without all the de-duplication logic. > >> So it'd be just a matter of removing that limitation regarding page >> cache and shared pages. > > Any idea why that limitation is there? No, but I'm guessing it's because nobody bothered to implement the required copy-on-write in the page cache, which would be a PITA to write - think of all the complexities with privilege checks and everything - even though the benefits for many kinds of applications would be important. >> If you asked me, I'd implement it as copy-on-write on the page cache >> (not the user page). That ought to be low-overhead. > > Not entirely sure I'm following this- if it's a shared page, it doesn't > matter who starts writing to it, as soon as that happens, it need to get > copied. Perhaps you mean that the application should keep the > "original" and that the page-cache should get the "copy" (or, really, > perhaps just forget about the page existing at that point- we won't want > it again...). > > Would that be a way to go, perhaps? This does go back to the "make it > act like mmap, but not *be* mmap", but the idea would be: > open(..., O_ZEROCOPY_READ) > read() - Goes to PG's shared buffers, pagecache and PG share the page > page fault (PG writes to it) - pagecache forgets about the page > write() / fsync() - operate as normal Yep.
В списке pgsql-hackers по дате отправления: