Re: patch to allow disable of WAL recycling
От | Jerry Jelinek |
---|---|
Тема | Re: patch to allow disable of WAL recycling |
Дата | |
Msg-id | CACPQ5Fo29F0VG7GZURW+2wEpRj5cOvh7nxTCcybwDoB0W41Aqw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: patch to allow disable of WAL recycling (Thomas Munro <thomas.munro@enterprisedb.com>) |
Список | pgsql-hackers |
Thomas,
We're using a zfs recordsize of 8k to match the PG blocksize of 8k, so what you're describing is not the issue here.
Thanks,
Jerry
On Thu, Jul 5, 2018 at 3:44 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Fri, Jul 6, 2018 at 3:37 AM, Jerry Jelinek <jerry.jelinek@joyent.com> wrote:
>> If the problem is specifically the file system caching behavior, then we
>> could also consider using the dreaded posix_fadvise().
>
> I'm not sure that solves the problem for non-cached files, which is where
> we've observed the performance impact of recycling, where what should be a
> write intensive workload turns into a read-modify-write workload because
> we're now reading an old WAL file that is many hours, or even days, old and
> has thus fallen out of the memory-cached data for the filesystem. The disk
> reads still have to happen.
What ZFS record size are you using? PostgreSQL's XLOG_BLCKSZ is usually 8192 bytes. When XLogWrite() calls write(some multiple of XLOG_BLCKSZ), on a traditional filesystem the kernel will say 'oh, that's overwriting whole pages exactly, so I have no need to read it from disk' (for example in FreeBSD ffs_vnops.c ffs_write() see the comment "We must peform a read-before-write if the transfer size does not cover the entire buffer"). I assume ZFS has a similar optimisation, but it uses much larger records than the traditional 4096 byte pages, defaulting to 128KB. Is that the reason for this?
--
Thomas Munro
http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: