Re: Clock sweep not caching enough B-Tree leaf pages?
От | Andres Freund |
---|---|
Тема | Re: Clock sweep not caching enough B-Tree leaf pages? |
Дата | |
Msg-id | 20140416075307.GC3906@awork2.anarazel.de обсуждение исходный текст |
Ответ на | Clock sweep not caching enough B-Tree leaf pages? (Peter Geoghegan <pg@heroku.com>) |
Ответы |
Re: Clock sweep not caching enough B-Tree leaf pages?
|
Список | pgsql-hackers |
Hi, It's good to see focus on this - some improvements around s_b are sorely needed. On 2014-04-14 10:11:53 -0700, Peter Geoghegan wrote: > 1) Throttles incrementation of usage_count temporally. It becomes > impossible to increment usage_count for any given buffer more > frequently than every 3 seconds, while decrementing usage_count is > totally unaffected. I think this is unfortunately completely out of question. For one a gettimeofday() for every uffer pin will become a significant performance problem. Even the computation of the xact/stm start/stop timestamps shows up pretty heavily in profiles today - and they are far less frequent than buffer pins. And that's on x86 linux, where gettimeofday() is implemented as something more lightweight than a full syscall. The other significant problem I see with this is that its not adaptive to the actual throughput of buffers in s_b. In many cases there's hundreds of clock cycles through shared buffers in 3 seconds. By only increasing the usagecount that often you've destroyed the little semblance to a working LRU there is right now. It also wouldn't work well for situations with a fast changing workload >> s_b. If you have frequent queries that take a second or so and access some data repeatedly (index nodes or whatnot) only increasing the usagecount once will mean they'll continually fall back to disk access. > 2) Has usage_count saturate at 10 (i.e. BM_MAX_USAGE_COUNT = 10), not > 5 as before. ... . This step on its own would be assumed extremely > counter-productive by those in the know, but I believe that other > measures ameliorate the downsides. I could be wrong about how true > that is in other cases, but then the case helped here isn't what you'd > call a narrow benchmark. I don't see which mechanisms you have suggested that counter this? I think having more granular usagecount is a good idea, but I don't think it can realistically be implemented with the current method of choosing victim buffers. The amount of cacheline misses around that is already a major scalability limit; we surely can't make this even worse. I think it'd be possible to get back to this if we had a better bgwriter implementation. Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: