Page-at-a-time Locking Considerations
От | Simon Riggs |
---|---|
Тема | Page-at-a-time Locking Considerations |
Дата | |
Msg-id | 1202141084.4252.480.camel@ebony.site обсуждение исходный текст |
Ответы |
Re: Page-at-a-time Locking Considerations
Re: Page-at-a-time Locking Considerations |
Список | pgsql-hackers |
In heapgetpage() we hold the buffer locked while we look for visible tuples. That works well in most cases since the visibility check is fast if we have status bits set. If we don't have visibility bits set we have to do things like scan the snapshot and confirm things via clog lookups. All of that takes time and can lead to long buffer lock times, possibly across multiple I/Os in the very worst cases. This doesn't just happen for old transactions. Accessing very recent TransactionIds is prone to rare but long waits when we ExtendClog(). Such problems are numerically rare, but the buffers with long lock times are also the ones that have concurrent or at least recent write operations on them. So all SeqScans have the potential to induce long wait times for write transactions, even if they are scans on 1 block tables. Tables with heavy write activity on them from multiple backends have their work spread across multiple blocks, so a SeqScan will hit this issue repeatedly as it encounters each current insertion point in a table and so greatly increases the chances of it occurring. It seems possible to just memcpy() the whole block away and then drop the lock quickly. That gives a consistent lock time in all cases and allows us to do the visibility checks in our own time. It might seem that we would end up copying irrelevant data, which is true. But the greatest cost is memory access time. If hardware memory pre-fetch cuts in we will find that the memory is retrieved en masse anyway; if it doesn't we will have to wait for each cache line. So the best case is actually an en masse retrieval of cache lines, in the common case where blocks are fairly full (vague cutoff is determined by exact mechanism of hardware/compiler induced memory prefetch). The copied block would be used only for visibility checks. The main buffer would retain its pin and we would pass references to the block through the executor as normal. So this would be a change completely isolated to heapgetpage(). Was the copy-aside method considered when we introduced page at a time mode? Any reasons to think it would be dangerous or infeasible? If not, I'll give it a bash and get some test results. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
В списке pgsql-hackers по дате отправления: