Re: refactoring relation extension and BufferAlloc(), faster COPY
От | Heikki Linnakangas |
---|---|
Тема | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Дата | |
Msg-id | 56577df0-f36d-20ee-301a-5d022980d619@iki.fi обсуждение исходный текст |
Ответ на | Re: refactoring relation extension and BufferAlloc(), faster COPY (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: refactoring relation extension and BufferAlloc(), faster COPY
|
Список | pgsql-hackers |
On 27/02/2023 23:45, Andres Freund wrote: > On 2023-02-27 18:06:22 +0200, Heikki Linnakangas wrote: >> We also do this in freespace.c and visibilitymap.c: >> >> /* Extend as needed. */ >> while (fsm_nblocks_now < fsm_nblocks) >> { >> PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now); >> >> smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now, >> pg.data, false); >> fsm_nblocks_now++; >> } >> >> We could use the new smgrzeroextend function here. But it would be better to >> go through the buffer cache, because after this, the last block, at >> 'fsm_nblocks', will be read with ReadBuffer() and modified. > > I doubt it's a particularly crucial thing to optimize. Yeah, it won't make any practical difference to performance. I'm more thinking if we can make this more consistent with other places where we extend a relation. > But, uh, isn't this code racy? Because this doesn't go through shared buffers, > there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know > that writing pages isn't atomic vs readers. So another connection could > connection could see the new relation size, but a read might return a > partially written state of the page. Which then would cause checksum > failures. And even worse, I think it could lead to loosing a write, if the > concurrent connection writes out a page. fsm_readbuf and vm_readbuf check the relation size first, with smgrnblocks(), before trying to read the page. So to have a problem, the smgrnblocks() would have to already return the new size, but the smgrread() would not return the new contents. I don't think that's possible, but not sure. >> We could use BulkExtendSharedRelationBuffered() to extend the relation and >> keep the last page locked, but the BulkExtendSharedRelationBuffered() >> signature doesn't allow that. It can return the first N pages locked, but >> there's no way to return the *last* page locked. > > We can't rely on bulk extending a, potentially, large number of pages in one > go anyway (since we might not be allowed to pin that many pages). So I don't > think requiring locking the last page is a really viable API. > > I think for this case I'd just just use the ExtendRelationTo() API we were > discussing nearby. Compared to the cost of reducing syscalls / filesystem > overhead to extend the relation, the cost of the buffer mapping lookup does't > seem significant. That's different in e.g. the hio.c case, because there we > need a buffer with free space, and concurrent activity could otherwise fill up > the buffer before we can lock it again. Works for me. - Heikki
В списке pgsql-hackers по дате отправления: