Re: refactoring relation extension and BufferAlloc(), faster COPY
От | Jim Nasby |
---|---|
Тема | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Дата | |
Msg-id | a1efd2f9-290e-b860-b490-a8bf6530b288@amazon.com обсуждение исходный текст |
Ответ на | refactoring relation extension and BufferAlloc(), faster COPY (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: refactoring relation extension and BufferAlloc(), faster COPY
|
Список | pgsql-hackers |
On 10/28/22 9:54 PM, Andres Freund wrote: > b) I found that is quite beneficial to bulk-extend the relation with > smgrextend() even without concurrency. The reason for that is the primarily > the aforementioned dirty buffers that our current extension method causes. > > One bit that stumped me for quite a while is to know how much to extend the > relation by. RelationGetBufferForTuple() drives the decision whether / how > much to bulk extend purely on the contention on the extension lock, which > obviously does not work for non-concurrent workloads. > > After quite a while I figured out that we actually have good information on > how much to extend by, at least for COPY / > heap_multi_insert(). heap_multi_insert() can compute how much space is > needed to store all tuples, and pass that on to > RelationGetBufferForTuple(). > > For that to be accurate we need to recompute that number whenever we use an > already partially filled page. That's not great, but doesn't appear to be a > measurable overhead. Some food for thought: I think it's also completely fine to extend any relation over a certain size by multiple blocks, regardless of concurrency. E.g. 10 extra blocks on an 80MB relation is 0.1%. I don't have a good feel for what algorithm would make sense here; maybe something along the lines of extend = max(relpages / 2048, 128); if extend < 8 extend = 1; (presumably extending by just a couple extra pages doesn't help much without concurrency).
В списке pgsql-hackers по дате отправления: