Re: Large block sizes support in Linux

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Large block sizes support in Linux
Дата
Msg-id Zf5BZVA4UhbSlLa4@momjian.us
обсуждение исходный текст
Ответ на Re: Large block sizes support in Linux  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Large block sizes support in Linux  (Pankaj Raghav <kernel@pankajraghav.com>)
Список pgsql-hackers
On Fri, Mar 22, 2024 at 10:31:11PM +0100, Tomas Vondra wrote:
> Right, but things change over time - current storage devices support
> much larger sectors (LBA format), usually 4K. And if you do I/O with
> this size, it's usually atomic.
> 
> AFAIK if you built Postgres with 4K pages, on a device with 4K LBA
> format, that would not need full-page writes - we always do I/O in 4k
> pages, and block layer does I/O (during writeback from page cache) with
> minimum guaranteed size = logical block size. 4K are great for OLTP
> systems in general, it'd be even better if we didn't need to worry about
> torn pages (but the tricky part is to be confident it's safe to disable
> them on a particular system).

Yes, even if the file system is 8k, and the storage is 8k, we only know
that torn pages are impossible if the file system never overwrites
existing 8k pages, but writes new ones and then makes it active.  I
think ZFS does that to handle snapshots.

> The other thing is - is there a reliable way to say when the guarantees
> actually apply? I mean, how would the administrator *know* it's safe to
> set full_page_writes=off, or even better how could we verify this when
> the database starts (and complain if it's not safe to disable FPW)?

Yes, this is quite hard to know.  Our docs have:

    https://www.postgresql.org/docs/current/wal-reliability.html
    
    Another risk of data loss is posed by the disk platter write operations
    themselves. Disk platters are divided into sectors, commonly 512 bytes
    each. Every physical read or write operation processes a whole sector.
    When a write request arrives at the drive, it might be for some multiple
    of 512 bytes (PostgreSQL typically writes 8192 bytes, or 16 sectors, at
    a time), and the process of writing could fail due to power loss at any
    time, meaning some of the 512-byte sectors were written while others
    were not. To guard against such failures, PostgreSQL periodically writes
    full page images to permanent WAL storage before modifying the actual
    page on disk. By doing this, during crash recovery PostgreSQL can
-->    restore partially-written pages from WAL. If you have file-system
-->    software that prevents partial page writes (e.g., ZFS), you can turn off
-->    this page imaging by turning off the full_page_writes parameter.
-->    Battery-Backed Unit (BBU) disk controllers do not prevent partial page
-->    writes unless they guarantee that data is written to the BBU as full
-->    (8kB) pages.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: session username in default psql prompt?
Следующее
От: jian he
Дата:
Сообщение: Re: SQL:2011 application time