Re: Enable data checksums by default

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Enable data checksums by default
Дата
Msg-id CAMFBP2pUCZz6YZH7k8bJ2pyh1XzPUR8ntCTjMM1O0-b7x85viw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Enable data checksums by default  (Ants Aasma <ants.aasma@cybertec.at>)
Список pgsql-hackers


On Fri, Aug 1, 2025 at 6:37 AM Ants Aasma <ants.aasma@cybertec.at> wrote:
Even if we made the checksum algorithm itself faster, the main issue
is actually memory bandwidth. Intel server CPUs have about half the
bandwidth of AMD ones. A checksum has to pull in the whole page in a
few hundred cycles. Without checksums only a part of the page might be
accessed and the accesses are spread over a longer time, making them
easier to hide by out-of-order execution.

But all the above still ends up at being a few hundred nanoseconds per
buffer read. Basically this ends up only mattering measurably for
in-RAM but out of shared buffers workloads. And the easy workaround is
to increase shared buffers. As you said, the main issue is the other
overheads that checksums pull in.

I want to point out that at some point in time there might well be demand for checksumming pages living in shared_buffers. Modern storage systems assume that the durable media is going to have errors and already have robust ways to detect that. But they also assume that ECC memory is bulletproof (it's not), and that's the biggest benefit to Postgres checksums: they protect data in the filesystem cache[1]. You obviously lose that if you size shared_buffers to consume most of available memory.

Obviously trying to address that is way beyond the scope of what's being discussed here. I'm honestly unsure of how relevant it is, but I wanted to make sure folks were aware of it.

1: I can't go into details, but I have seen a case where Postgres checksums led to an investigation that ultimately revealed a memory-related issue. In other words, data was actually getting corrupted while in the filesystem cache. Obviously data could (and likely was) also get corrupted in shared buffers, but the corruption in the FS cache was what prompted the investigation that ultimately found the hardware issue. Fortunately shared_buffers was small enough to make it more likely that corruption would happen outside of Postgres, so it could be detected.

В списке pgsql-hackers по дате отправления: