Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
От | Masahiro Ikeda |
---|---|
Тема | Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested. |
Дата | |
Msg-id | 9f4e19ad-518d-b91a-e500-25a666471c42@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested. (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Ответы |
Re: make the stats collector shutdown without writing the statsfiles if the immediate shutdown is requested.
|
Список | pgsql-hackers |
On 2021/03/24 18:36, Fujii Masao wrote: > > > On 2021/03/24 3:51, Andres Freund wrote: >> Hi, >> >> On 2021-03-23 15:50:46 +0900, Fujii Masao wrote: >>> This fact makes me wonder that if we collect the statistics about WAL writing >>> from walreceiver as we discussed in other thread, the stats collector should >>> be invoked at more earlier stage. IIUC walreceiver can be invoked before >>> PMSIGNAL_BEGIN_HOT_STANDBY is sent. >> >> FWIW, in the shared memory stats patch the stats subsystem is >> initialized early on by the startup process. > > This is good news! Fujii-san, Andres-san, Thanks for your comments! I didn't think about the start order. From the point of view, I noticed that the current source code has two other concerns. 1. This problem is not only for the wal receiver. The problem which the wal receiver starts before the stats collector is launched during archive recovery is not only for the the wal receiver but also the checkpointer and the bgwriter. Before starting redo, the startup process sends the postmaster "PMSIGNAL_RECOVERY_STARTED" signal to launch the checkpointer and the bgwriter to be able to perform creating restartpoint. Although the socket for communication between the stats collector and the other processes is made in earlier stage via pgstat_init(), I agree to make the stats collector starts earlier stage is defensive. BTW, in my environments(linux, net.core.rmem_default = 212992), the socket can buffer almost 300 WAL stats messages. This mean that, as you said, if the redo phase is too long, it can lost the messages easily. 2. To make the stats clear in redo phase. The statistics can be reset after the wal receiver, the checkpointer and the wal writer are started in redo phase. So, it's not enough the stats collector is invoked at more earlier stage. We need to fix it. (I hope I am not missing something.) Thanks to Andres-san's work([1]), the above problems will be handle in the shared memory stats patch. First problem will be resolved since the stats are collected in shared memory, so the stats collector process is unnecessary itself. Second problem will be resolved to remove the reset code because the temporary stats file won't generated, and if the permanent stats file corrupted, just recreate it. [1] https://github.com/anarazel/postgres/compare/master...shmstat-before-split-2021-03-22 Regards, -- Masahiro Ikeda NTT DATA CORPORATION
В списке pgsql-hackers по дате отправления: