Re: Synchronized scans versus relcache reinitialization
От | Tom Lane |
---|---|
Тема | Re: Synchronized scans versus relcache reinitialization |
Дата | |
Msg-id | 21415.1338432596@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Synchronized scans versus relcache reinitialization (Jeff Davis <pgsql@j-davis.com>) |
Список | pgsql-hackers |
Jeff Davis <pgsql@j-davis.com> writes: > On Sat, 2012-05-26 at 15:14 -0400, Tom Lane wrote: >> 3. Having now spent a good deal of time poking at this, I think that the >> syncscan logic is in need of more tuning, and I am wondering whether we >> should even have it turned on by default. It appears to be totally >> useless for fully-cached-in-RAM scenarios, even if most of the relation >> is out in kernel buffers rather than in shared buffers. The best case >> I saw was less than 2X speedup compared to N-times-the-single-client >> case, and that wasn't very reproducible, and it didn't happen at all >> unless I hacked BAS_BULKREAD mode to use a ring buffer size many times >> larger than the current 256K setting (otherwise the timing requirements >> are too tight for multiple backends to stay in sync --- a seqscan can >> blow through that much data in a fraction of a millisecond these days, >> if it's reading from kernel buffers). The current tuning may be all >> right for cases where you're actually reading from spinning rust, but >> that seems to be a decreasing fraction of real-world use cases. > Do you mean that the best case you saw ever was 2X, or the best case > when the table is mostly in kernel buffers was 2X? I was only examining a fully-cached-in-RAM case. > I clearly saw better than 2X when the table was on disk, so if you > aren't, we should investigate. I don't doubt that syncscan can provide better than 2X speedup if you have more than 2 concurrent readers for a syncscan traversing data that's too big to fit in RAM. What I'm questioning is whether such cases represent a sufficiently large fraction of our userbase to justify having syncscan on by default. I would be happier about having it on if it seemed to be useful for fully-cached scenarios, but it doesn't. > One thing we could do is drive the threshold from effective_cache_size > rather than shared_buffers, which was discussed during 8.3 development. If we were going to do that, I think that we'd need to consider having different thresholds for using bulkread access strategy and using syncscan, because not using bulkread is going to blow out the shared_buffers cache. We originally avoided that on the grounds of not wanting to have to optimize more than 2 behaviors, but maybe it's time to investigate more. regards, tom lane
В списке pgsql-hackers по дате отправления: