Re: refactoring relation extension and BufferAlloc(), faster COPY

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: refactoring relation extension and BufferAlloc(), faster COPY
Дата
Msg-id 20230301172503.jsc4jpulsmkc7tmq@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: refactoring relation extension and BufferAlloc(), faster COPY  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

On 2023-03-01 09:02:00 -0800, Andres Freund wrote:
> On 2023-03-01 11:12:35 +0200, Heikki Linnakangas wrote:
> > On 27/02/2023 23:45, Andres Freund wrote:
> > > But, uh, isn't this code racy? Because this doesn't go through shared buffers,
> > > there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know
> > > that writing pages isn't atomic vs readers. So another connection could
> > > connection could see the new relation size, but a read might return a
> > > partially written state of the page. Which then would cause checksum
> > > failures. And even worse, I think it could lead to loosing a write, if the
> > > concurrent connection writes out a page.
> > 
> > fsm_readbuf and vm_readbuf check the relation size first, with
> > smgrnblocks(), before trying to read the page. So to have a problem, the
> > smgrnblocks() would have to already return the new size, but the smgrread()
> > would not return the new contents. I don't think that's possible, but not
> > sure.
> 
> I hacked Thomas' program to test torn reads to ftruncate the file on the write
> side.
> 
> It frequently observes a file size that's not the write size (e.g. reading 4k
> when writing an 8k block).
> 
> After extending the test to more than one reader, I indeed also see torn
> reads. So far all the tears have been at a 4k block boundary. However so far
> it always has been *prior* page contents, not 0s.

On tmpfs the failure rate is much higher, and we also end up reading 0s,
despite never writing them.

I've attached my version of the test program.

ext4: lots of 4k reads with 8k writes, some torn reads at 4k boundaries
xfs: no issues
tmpfs: loads of 4k reads with 8k writes, lots torn reads reading 0s, some torn reads at 4k boundaries


Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Add LZ4 compression in pg_dump
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: add PROCESS_MAIN to VACUUM