Обсуждение: checkpoint occurs very often when vacuum full running
Hi,
Can someone explain the logic behind it ? I know that vacuum full isnt something recommended but I found out that whenever I run vacuum full on my database checkpoint occurs during that time every second ! well I know that VACUUM FULL duplicates the data into new data files and then it deletes the old data files. The writing the vacuum does, is it with the checkpoint process ?
Is there any connection ?
Thanks.
Hi Checkpoint can be occurs due timeout (checkpoint_timeout) or due amount of WAL (max_wal_size). Vacuum full does write all data through WAL and therefore may trigger checkpoint more frequently. regards, Sergei
Hi Checkpoint can be occurs due timeout (checkpoint_timeout) or due amount of WAL (max_wal_size). Vacuum full does write all data through WAL and therefore may trigger checkpoint more frequently. regards, Sergei
First of all thank you for the quick answer. In my case checkpoint happened every one second during the vacuum full so the checkpoint timeout isn't relevant. My guess was that it writes the changes to the wals but I didn't find anything about it in the documentation. Can you share a link that proves it ? I mean basicly the wals should contain the changes, and vacuum full changes the location of the data and not actually the data.
On Thu, Nov 15, 2018, 9:10 PM Sergei Kornilov <sk@zsrv.org wrote:
Hi
Checkpoint can be occurs due timeout (checkpoint_timeout) or due amount of WAL (max_wal_size).
Vacuum full does write all data through WAL and therefore may trigger checkpoint more frequently.
regards, Sergei
First of all thank you for the quick answer. In my case checkpoint happened every one second during the vacuum full so the checkpoint timeout isn't relevant. My guess was that it writes the changes to the wals but I didn't find anything about it in the documentation. Can you share a link that proves it ? I mean basicly the wals should contain the changes, and vacuum full changes the location of the data and not actually the data.
On Thu, Nov 15, 2018, 9:10 PM Sergei Kornilov <sk@zsrv.org wrote:
Hi
Checkpoint can be occurs due timeout (checkpoint_timeout) or due amount of WAL (max_wal_size).
Vacuum full does write all data through WAL and therefore may trigger checkpoint more frequently.
regards, Sergei
Hi > I mean basicly the wals should contain the changes, and vacuum full changes the location of the data and not actually thedata. Row location is data. For example, index lookup relies on TID (tuple id, hidden ctid column) - physical row address in datafile. Postgresql WAL - it is about physical changes in datafiles (block level), not logical. Just moving one row to another placewithout logical changes means: mark row deleted in old place, write to new place and update every index which containsthis row. And vacuum full does not change location, it create copy in different datafile. Then it rebuild every index because TID wasobviously changed. Then vacuum full drop old datafiles. Full size of new datafile and indexes should be written to WAL,because all of this is changes and must be reliable written (and then can be replayed on replicas). > but I didn't find anything about it in the documentation hmm, i can not found something exact in documentation about it.. It's my knowledge about postgresql internals. You can read this article: https://www.depesz.com/2011/07/14/write-ahead-log-understanding-postgresql-conf-checkpoint_segments-checkpoint_timeout-checkpoint_warning/ Itsabout WAL logic. All IO operations use pages, and difference between pages written to WAL. For example, full_page_writes setting ( https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-FULL-PAGE-WRITES) say about pages too. > writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint. If you want change few bytes in page - the whole page (8kb typical) will be written to WAL during first change of this pageafter checkpoint. regards, Sergei
Hi > I mean basicly the wals should contain the changes, and vacuum full changes the location of the data and not actually thedata. Row location is data. For example, index lookup relies on TID (tuple id, hidden ctid column) - physical row address in datafile. Postgresql WAL - it is about physical changes in datafiles (block level), not logical. Just moving one row to another placewithout logical changes means: mark row deleted in old place, write to new place and update every index which containsthis row. And vacuum full does not change location, it create copy in different datafile. Then it rebuild every index because TID wasobviously changed. Then vacuum full drop old datafiles. Full size of new datafile and indexes should be written to WAL,because all of this is changes and must be reliable written (and then can be replayed on replicas). > but I didn't find anything about it in the documentation hmm, i can not found something exact in documentation about it.. It's my knowledge about postgresql internals. You can read this article: https://www.depesz.com/2011/07/14/write-ahead-log-understanding-postgresql-conf-checkpoint_segments-checkpoint_timeout-checkpoint_warning/ Itsabout WAL logic. All IO operations use pages, and difference between pages written to WAL. For example, full_page_writes setting ( https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-FULL-PAGE-WRITES) say about pages too. > writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint. If you want change few bytes in page - the whole page (8kb typical) will be written to WAL during first change of this pageafter checkpoint. regards, Sergei
Mariel Cherkassky wrote: > First of all thank you for the quick answer. In my case checkpoint happened > every one second during the vacuum full so the checkpoint timeout isn't relevant. > My guess was that it writes the changes to the wals but I didn't find anything > about it in the documentation. Can you share a link that proves it ? > I mean basicly the wals should contain the changes, and vacuum full changes > the location of the data and not actually the data. VACUUM (FULL) completely rewrites all the tables and indexes, so the complete database will go into the WAL (these data changes have to be replayed in case of a crash!). WAL contains the physical and not the logical changes, and the physical data *are* modified. You should let autovacuum do the job instead of running VACUUM (FULL), unless your whole database is bloated beyond tolerance. That will cause less WAL activity and also won't disrupt normal database operation. If you really need that VACUUM (FULL), you can increase "max_wal_size" to get fewer checkpoints. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hi Laurenz.
thank you for the explanation but I still got a few questions about this subject :
1.By physical location of the data do you mean the location on disk of the objects ? I mean i thought that the wal files containing the logical changes , for example If I run an update then the wal file will contain the update in some format. Can you explain then what exactly it contains and what you meant by physical ?
2.So If the vacuum full stopped in some brutal way (kill -9 or the disk run of space) does all the new data files that are created deleted or left on the storage as orphan files ?
Thanks, Mariel.
בתאריך יום ה׳, 15 בנוב׳ 2018 ב-22:32 מאת Laurenz Albe <laurenz.albe@cybertec.at>:
Mariel Cherkassky wrote:
> First of all thank you for the quick answer. In my case checkpoint happened
> every one second during the vacuum full so the checkpoint timeout isn't relevant.
> My guess was that it writes the changes to the wals but I didn't find anything
> about it in the documentation. Can you share a link that proves it ?
> I mean basicly the wals should contain the changes, and vacuum full changes
> the location of the data and not actually the data.
VACUUM (FULL) completely rewrites all the tables and indexes, so the complete
database will go into the WAL (these data changes have to be replayed in case
of a crash!). WAL contains the physical and not the logical changes, and the
physical data *are* modified.
You should let autovacuum do the job instead of running VACUUM (FULL), unless
your whole database is bloated beyond tolerance. That will cause less WAL
activity and also won't disrupt normal database operation.
If you really need that VACUUM (FULL), you can increase "max_wal_size" to
get fewer checkpoints.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Mariel Cherkassky wrote: > thank you for the explanation but I still got a few questions about this subject : > 1.By physical location of the data do you mean the location on disk of the objects? > I mean i thought that the wal files containing the logical changes , for example > If I run an update then the wal file will contain the update in some format. > Can you explain then what exactly it contains and what you meant by physical ? It does *not* contain the SQL executed - how'd that work with functions like "random"? Rather, the information is like "replace the 42 bytes from offset 99 of block 12 in file xy with these bytes: ...". > 2.So If the vacuum full stopped in some brutal way (kill -9 or the disk run of space) > does all the new data files that are created deleted or left on the storage as orphan files ? In an out of space scenario, the files would get removed. If you kill -9 the backend they will be left behind. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com