On 2/2/17 1:50 PM, Andres Freund wrote:
>>> FWIW, I think working on replacing bgwriter (e.g. by working on the
>>> patch I send with a POC replacement) wholesale is a better approach than
>>> spending time increasing limits.
>> Do you have a link to that? I'm not seeing anything in the archives.
> Not at hand, but I can just give you the patches. These are very likely
> to conflict, but it shouldn't be too hard to resolve...
>
> What it basically does is move as much of the clock-sweep to bgwriter,
> which keeps a certain number of clean pages available. There's a
> lock-free ringbuffer that backends can just pick pages off.
>
> The approach with the ringbuffer has the advantage that with relatively
> small changes we can scale to having multiple bgwriters (needs some
> changes in the ringbuf implementation, but not that many).
Interesting. Probably kills a couple birds with one stone:
- This should be a lot cheaper for backends then the clock sweep
- Current bgwriter can easily get stuck a full scan ahead of the clock
if shared_buffers is very large, due to forcibly scanning all of shared
buffers every 2 minutes.
- The ringbuffers in shared buffers can be problematic. One possible way
of solving that is to get rid of ringbuffers entirely and rely on
different initial values for usage_count instead, but that's not
desirable if it just means more clock sweep work for backends.
FWIW, I started looking into this stuff because of a customer system
where shared buffers is currently ~4x larger than the cluster, yet
there's a non-trivial amount of buffers being written by backends.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)