Re: Large files for relations

Поиск
Список
Период
Сортировка
От MARK CALLAGHAN
Тема Re: Large files for relations
Дата
Msg-id CAFbpF8OaxX+ZhKb=XTnLxGgJZxC8iTxEF_YeNEjwWWZNG1tAEQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Large files for relations  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers


On Fri, May 12, 2023 at 4:02 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Sat, May 13, 2023 at 4:41 AM MARK CALLAGHAN <mdcallag@gmail.com> wrote:
> Repeating what was mentioned on Twitter, because I had some experience with the topic. With fewer files per table there will be more contention on the per-inode mutex (which might now be the per-inode rwsem). I haven't read filesystem source in a long time. Back in the day, and perhaps today, it was locked for the duration of a write to storage (locked within the kernel) and was briefly locked while setting up a read.
>
> The workaround for writes was one of:
> 1) enable disk write cache or use battery-backed HW RAID to make writes faster (yes disks, I encountered this prior to 2010)
> 2) use XFS and O_DIRECT in which case the per-inode mutex (rwsem) wasn't locked for the duration of a write
>
> I have a vague memory that filesystems have improved in this regard.

(I am interpreting your "use XFS" to mean "use XFS instead of ext4".)

Yes, although when the decision was made it was probably ext-3 -> XFS.  We suffered from fsync a file == fsync the filesystem
because MySQL binlogs use buffered IO and are appended on write. Switching from ext-? to XFS was an easy perf win
so I don't have much experience with ext-? over the past decade.
 
Right, 80s file systems like UFS (and I suspect ext and ext2, which

Late 80s is when I last hacked on Unix fileys code, excluding browsing XFS and ext source. Unix was easy back then -- one big kernel lock covers everything.
 
some time sooner).  Currently our code believes that it is not safe to
call fdatasync() for files whose size might have changed.  There is no

Long ago we added code for InnoDB to avoid fsync/fdatasync in some cases when O_DIRECT was used. While great for performance
we also forgot to make sure they were still done when files were extended. Eventually we fixed that.
 
Thanks for all of the details.

--
Mark Callaghan
mdcallag@gmail.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: cutting down the TODO list thread
Следующее
От: "Drouvot, Bertrand"
Дата:
Сообщение: Re: Autogenerate some wait events code and documentation