Re: pg_preadv() and pg_pwritev()
От | Thomas Munro |
---|---|
Тема | Re: pg_preadv() and pg_pwritev() |
Дата | |
Msg-id | CA+hUKG+TCoareSYcAu05yaMh9eUck6Hd8FQZt5VEvotNs3YOMQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_preadv() and pg_pwritev() (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: pg_preadv() and pg_pwritev()
|
Список | pgsql-hackers |
On Mon, Dec 21, 2020 at 11:40 AM Andres Freund <andres@anarazel.de> wrote: > On 2020-12-20 16:26:42 +1300, Thomas Munro wrote: > > > 1. port.h cannot assume that <limits.h> has already been included; > > > nor do I want to fix that by including <limits.h> there. Do we > > > really need to define a fallback value of IOV_MAX? If so, > > > maybe the answer is to put the replacement struct iovec and > > > IOV_MAX in some new header. > > > > Ok, I moved all this stuff into port/pg_uio.h. > > Can we come up with a better name than 'uio'? I find that a not exactly > meaningful name. Ok, let's try port/pg_iovec.h. > Or perhaps we could just leave the functions in port/port.h, but extract > the value of the define at configure time? That way only pread.c / > pwrite.c would need it. That won't work for the struct definition, so client code would need to remember to do: #ifdef HAVE_SYS_UIO_H #include <sys/uio.h> #endif ... which is a little tedious, or port.h would need to do that and incur an overhead in every translation unit, which Tom objected to. That's why I liked the separate header idea. > > > 3. The patch as given won't prove anything except that the code > > > compiles. Is it worth fixing at least one code path to make > > > use of pg_preadv and pg_pwritev, so we can make sure this code > > > is tested before there's layers of other new code on top? > > > > OK, here's a patch to zero-fill fresh WAL segments with pwritev(). > > I think that's a good idea. However, I see two small issues: 1) If we > write larger amounts at once, we need to handle partial writes. That's a > large enough amount of IO that it's much more likely to hit a memory > shortage or such in the kernel - we had to do that a while a go for WAL > writes (which can also be larger), if memory serves. > > Perhaps we should have pg_pwritev/readv unconditionally go through > pwrite.c/pread.c and add support for "continuing" partial writes/reads > in one central place? Ok, here's a new version with the following changes: 1. Define PG_IOV_MAX, a reasonable size for local variables, not larger than IOV_MAX. 2 Use 32 rather than 1024, based on off-list complaint about 1024 potentially swamping the IO system unfairly. 3. Add a wrapper pg_pwritev_retry() to retry automatically on short writes. (I didn't write pg_preadv_retry(), because I don't currently need it for anything so I didn't want to have to think about EOF vs short-reads-for-implementation-reasons.) 4. I considered whether pg_pwrite() should have built-in retry instead of a separate wrapper, but I thought of an argument against hiding the "raw" version: the AIO patch set already understands short reads/writes and knows how to retry at a higher level, as that's needed for native AIO too, so I think it makes sense to be able to keep things the same and not solve the same problem twice. A counter argument would be that you could get the retry underway sooner with a tight loop, but I'm not expecting this to be common. > > I'm drawing a blank on trivial candidate uses for preadv(), without > > infrastructure from later patches. > > Can't immediately think of something either. One idea I had for the future is for xlogreader.c to read the WAL into a large multi-page circular buffer, which could wrap around using a pair of iovecs, but that requires lots more work .
Вложения
В списке pgsql-hackers по дате отправления: