Re: Vectored I/O in bulk_write.c

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Vectored I/O in bulk_write.c
Дата
Msg-id CA+hUKGKsP+xeRm0TGmVrQ4nPK6CJ=CDGPXH3e5FaffQXZNpNTg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Vectored I/O in bulk_write.c  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Vectored I/O in bulk_write.c  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Wed, Mar 13, 2024 at 9:57 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Let's bite the bullet and merge the smgrwrite and smgrextend functions
> at the smgr level too. I propose the following signature:
>
> #define SWF_SKIP_FSYNC          0x01
> #define SWF_EXTEND              0x02
> #define SWF_ZERO                0x04
>
> void smgrwritev(SMgrRelation reln, ForkNumber forknum,
>                 BlockNumber blocknum,
>                 const void **buffer, BlockNumber nblocks,
>                 int flags);
>
> This would replace smgwrite, smgrextend, and smgrzeroextend. The

That sounds pretty good to me.

> > Here also is a first attempt at improving the memory allocation and
> > memory layout.
> > ...
> > +typedef union BufferSlot
> > +{
> > +     PGIOAlignedBlock buffer;
> > +     dlist_node      freelist_node;
> > +}                    BufferSlot;
> > +
>
> If you allocated the buffers in one large contiguous chunk, you could
> often do one large write() instead of a gathered writev() of multiple
> blocks. That should be even better, although I don't know much of a
> difference it makes. The above layout wastes a fair amount memory too,
> because 'buffer' is I/O aligned.

The patch I posted has an array of buffers with the properties you
describe, so you get a pwrite() (no 'v') sometimes, and a pwritev()
with a small iovcnt when it wraps around:

pwrite(...) = 131072 (0x20000)
pwritev(...,3,...) = 131072 (0x20000)
pwrite(...) = 131072 (0x20000)
pwritev(...,3,...) = 131072 (0x20000)
pwrite(...) = 131072 (0x20000)

Hmm, I expected pwrite() alternating with pwritev(iovcnt=2), the
latter for when it wraps around the buffer array, so I'm not sure why it's
3.  I guess the btree code isn't writing them strictly monotonically or
something...

I don't believe it wastes any memory on padding (except a few bytes
wasted by palloc_aligned() before BulkWriteState):

(gdb) p &bulkstate->buffer_slots[0]
$4 = (BufferSlot *) 0x15c731cb4000
(gdb) p &bulkstate->buffer_slots[1]
$5 = (BufferSlot *) 0x15c731cb6000
(gdb) p sizeof(bulkstate->buffer_slots[0])
$6 = 8192



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jelte Fennema-Nio
Дата:
Сообщение: Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Vectored I/O in bulk_write.c