Re: Initdb-time block size specification

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Initdb-time block size specification
Дата
Msg-id c356e5b7-ab37-845f-04cb-1f5649a9c673@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Initdb-time block size specification  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 7/1/23 00:59, Andres Freund wrote:
> On 2023-06-30 18:37:39 -0400, Bruce Momjian wrote:
>> On Sat, Jul  1, 2023 at 12:21:03AM +0200, Tomas Vondra wrote:
>>> On 6/30/23 23:53, Bruce Momjian wrote:
>>>> For a 4kB write, to say it is not partially written would be to require
>>>> the operating system to guarantee that the 4kB write is not split into
>>>> smaller writes which might each be atomic because smaller atomic writes
>>>> would not help us.
>>>
>>> Right, that's the dance we do to protect against torn pages. But Andres
>>> suggested that if you have modern storage and configure it correctly,
>>> writing with 4kB pages would be atomic. So we wouldn't need to do this
>>> FPI stuff, eliminating pretty significant source of write amplification.
>>
>> I agree the hardware is atomic for 4k writes, but do we know the OS
>> always issues 4k writes?
> 
> When using a sector size of 4K you *can't* make smaller writes via normal
> paths. The addressing unit is in sectors. The details obviously differ between
> storage protocol, but you pretty much always just specify a start sector and a
> number of sectors to be operated on.
> 
> Obviously the kernel could read 4k, modify 512 bytes in-memory, and then write
> 4k back, but that shouldn't be a danger here.  There might also be debug
> interfaces to allow reading/writing in different increments, but that'd not be
> something happening during normal operation.

I think it's important to point out that there's a physical and logical
sector size. The "physical" is what the drive does internally, "logical"
defines what OS does.

Some drives have 4k physical sectors but only 512B logical sectors.
AFAIK most "old" SATA SSDs do it that way, for compatibility reasons.

New drives may have 4k physical sectors but typically support both 512B
and 4k logical sectors - my nvme SSDs do this, for example.

My understanding is that for drives with 4k physical+logical sectors,
the OS would only issue "full" 4k writes.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Initdb-time block size specification
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Initdb-time block size specification