Re: Initdb-time block size specification
От | Andres Freund |
---|---|
Тема | Re: Initdb-time block size specification |
Дата | |
Msg-id | 20230630225909.ecthnlfvlnk3ij2k@awork3.anarazel.de обсуждение исходный текст |
Ответ на | Re: Initdb-time block size specification (Bruce Momjian <bruce@momjian.us>) |
Ответы |
Re: Initdb-time block size specification
|
Список | pgsql-hackers |
On 2023-06-30 18:37:39 -0400, Bruce Momjian wrote: > On Sat, Jul 1, 2023 at 12:21:03AM +0200, Tomas Vondra wrote: > > On 6/30/23 23:53, Bruce Momjian wrote: > > > For a 4kB write, to say it is not partially written would be to require > > > the operating system to guarantee that the 4kB write is not split into > > > smaller writes which might each be atomic because smaller atomic writes > > > would not help us. > > > > Right, that's the dance we do to protect against torn pages. But Andres > > suggested that if you have modern storage and configure it correctly, > > writing with 4kB pages would be atomic. So we wouldn't need to do this > > FPI stuff, eliminating pretty significant source of write amplification. > > I agree the hardware is atomic for 4k writes, but do we know the OS > always issues 4k writes? When using a sector size of 4K you *can't* make smaller writes via normal paths. The addressing unit is in sectors. The details obviously differ between storage protocol, but you pretty much always just specify a start sector and a number of sectors to be operated on. Obviously the kernel could read 4k, modify 512 bytes in-memory, and then write 4k back, but that shouldn't be a danger here. There might also be debug interfaces to allow reading/writing in different increments, but that'd not be something happening during normal operation.
В списке pgsql-hackers по дате отправления: